To get the files there are three options:

Option one: full download from the web

This is the most data rich option. This will bring down keywords, abstracts, citation information, country of origin etc

  1. Go to the WoS website run search
  2. Note down the total number of searches returned
  3. Hit add to marked list on first seach return page
  4. Put in 1 : ‘to the total number of searches returned’ to save them all
  5. Select marked list on the top left (should now have the number of records saved)
  6. Step 1 select the span of records to download 1:500, 501:1000 etc etc (500 at a time)
  7. Step 2 on output options choose ‘select all’ for information brought down
  8. Step 3 choose save to other file formats and then select bibTex
  9. Repeat 6-8 500 at a time until all references are downloaded
  10. Move the .bib files into there own folder before starting a new search in which you download .bib files (do not mix them up in your downloads folder!)

Option two: partial download using the api

This will bring down paper titles, keywords, author afiliations and some other metadata but not citation information or the country information. You can get citations per year from online if its less than 10000 (see below).

  1. Open up woslite.py in a text editor (bbedit (mac), notepad (windows) are good options)
  2. Edit the line that is called query = ’’ add between the quotes your search term as it reads in WoS e.g. query = ’TS=( “HTS” OR “High throughput sequenc*" OR “shotgun”…… AND WC …’ If you are copying directly from the excel file you are going to have to copy the special characters (“,*) and find/replace them in the text editor. Microsoft does not write in plain text and python does not approve of this.
  3. If neccessary change the years. If you are on the genomic era you will want to limit this to 2005:2020. For the rest of you stick the search term into WoS online to identify the first year.
  4. Open up ‘Terminal’ (Mac) or ‘Command Prompt’ in windows. For windows see https://www.pythoncentral.io/execute-python-script-file-shell/ for how to run a python script on Macs see Eddy
  5. This will build an output file called something like ‘WoS_API_download.txt’

Option three: broadscale metadata

You can get broad scale data from online without doing the 500 at a time-but not keywords and other metadata.

  1. Put your search into WoS
  2. Click the analyse tab
  3. Select publication years
  4. Scroll to the bottom of the page (bottom right)
  5. Select all data rows (up to 200,000-this works because we are looking at dates not individual records)
  6. Hit download-this will download as analyze.txt
  7. Move it to the location you want for R
  8. Open it in a text editor (i.e. BBEdit, TextEdit, notepad etc)
  9. Change spaces to underscores and remove the random lines at the end of the file that are not data and save it under a better name (dont use spaces in the name)
  10. Go back to WoS search select countries/regions tab and repeat steps 5:9 to download a file of the papers per countries You can do a citation report and get citation per year, if its less than 10000 You cannot get keywords from this (as far as I can tell anyhow)

Analysing data that you download 500 at a time

First up set your R up by loading the required packages and setting the working directory. Change the path in setwd to where you have saved your files.

If you failed to follow Eddy’s instructions and did not install the packages already you will need to install them also.

#unhash this if you can't follow instructions
#install.packages('bibliometrix') 
#install.packages('tidyverse') 
#install.packages("wordcloud")
#install.packages('tm')
#install.packages('RColorBrewer')

#load packages
library(bibliometrix)
library(tidyverse)
library(wordcloud)
library(tm)
library(RColorBrewer)

setwd("~/Documents/CoAuthorMS/parasitebibsearch/parasitesonly/")

We are going to use the bibliometrix package to load our many files from the downloads in but first we are just going to modify a function from that package to streamline the read in of files (dont worry about what is going on here)

readFilesmod<-function (...) 
{
  arguments <- as.list(...) 
  k = length(arguments)
  D = list()
  enc = "UTF-8"
  for (i in 1:k) {
    D[[i]] = suppressWarnings(readLines(arguments[[i]], encoding = enc))
  }
  D = unlist(D)
  return(D)
}

Now we are going to read our data in using the function from above. This will generate a huge character file which is horible to look at. We will use a function from the bibliometrix package to turn this into a table.

file_list<-list.files(pattern='*.bib',full.names=T)
citations<-readFilesmod(dput(as.character(file_list))) 
## c("./savedrecs.bib", "./savedrecs(1).bib", "./savedrecs(2).bib", 
## "./savedrecs(3).bib")
citations_df <- convert2df(citations, dbsource = "isi", format = "bibtex")
## 
## Converting your isi collection into a bibliographic dataframe
## 
## Articles extracted   100 
## Articles extracted   200 
## Articles extracted   300 
## Articles extracted   400 
## Articles extracted   500 
## Articles extracted   600 
## Articles extracted   700 
## Articles extracted   800 
## Articles extracted   900 
## Articles extracted   1000 
## Articles extracted   1100 
## Articles extracted   1200 
## Articles extracted   1300 
## Articles extracted   1400 
## Articles extracted   1500 
## Articles extracted   1507 
## Done!
## 
## 
## Generating affiliation field tag AU_UN from C1:  Done!

Now we have a dataframe-much easier! You can view this to see the various feilds with the authors etc.

We can also subset this dataset here: If we only wanted to look at records from 2000-2005 (year is under PY in the dataframe-publication year).

citations_df_2000_2005<-citations_df %>% filter(between(PY, 2000, 2005))

We are going to use the function biblioanalysis to turn our dataframe into a object of various tables. None of these statistics are particularily hard to generate or special-it just does it in one hit which is nice! This function builds an object with various dataframes stored in it (23 dataframes). To access a dataframe you can call it directly. Remember you can view the help file for a function at anytime in Rstudio e.g. ?biblioAnalysis

This transformation drops some information so we will go back to the orginal table every now and then-depending on what we want to do.

citations_ana <- biblioAnalysis(citations_df, sep = ";")
#you can call them directly with:
head(citations_ana$Years)
## [1] 2018 2018 2018 2018 2018 2018
#Most of them are just straight forward transformation, e.g.
head(citations_ana$TotalCitation)
## [1] 0 0 0 1 0 0
#is just this column from the orginal dataframe
head(citations_df$TC)
## [1] 0 0 0 1 0 0

The handy thing about turning it into a bibliometrix object is that use can use the summary function on the bibliometrix object. You can set the number of entries to return by changing the k = X

citations_ana.sum <- summary(object = citations_ana, k = 100, pause = FALSE)
## 
## 
## Main Information about data
## 
##  Documents                             1507 
##  Sources (Journals, Books, etc.)       429 
##  Keywords Plus (ID)                    3835 
##  Author's Keywords (DE)                3011 
##  Period                                1966 - 2018 
##  Average citations per documents       30.1 
## 
##  Authors                               4639 
##  Author Appearances                    7058 
##  Authors of single authored documents  69 
##  Authors of multi authored documents   4570 
## 
##  Documents per Author                  0.325 
##  Authors per Document                  3.08 
##  Co-Authors per Documents              4.68 
##  Collaboration Index                   3.26 
##  
## 
## Annual Scientific Production
## 
##  Year    Articles
##     1966        1
##     1968        1
##     1973        1
##     1974        1
##     1975        3
##     1977        2
##     1978        4
##     1979        3
##     1980        6
##     1981        3
##     1982        9
##     1983        7
##     1984       11
##     1985        5
##     1986        7
##     1987       15
##     1988        5
##     1989        8
##     1990       17
##     1991       67
##     1992       84
##     1993       70
##     1994       67
##     1995       66
##     1996       65
##     1997       72
##     1998       95
##     1999       82
##     2000       67
##     2001       51
##     2002       54
##     2003       57
##     2004       60
##     2005       41
##     2006       57
##     2007       42
##     2008       44
##     2009       40
##     2010       27
##     2011       27
##     2012       19
##     2013       22
##     2014       23
##     2015       28
##     2016       29
##     2017       23
##     2018       19
## 
## Annual Percentage Growth Rate 6.610257 
## 
## 
## Most Productive Authors
## 
##       Authors        Articles Authors        Articles Fractionalized
## 1   TIBAYRENC M            42 TIBAYRENC M                      12.05
## 2   PRATLONG F             41 SUPURAN CT                       10.32
## 3   SUPURAN CT             33 ANDREWS RH                        5.93
## 4   DEDET JP               29 PRATLONG F                        5.60
## 5   MATTIUCCI S            25 CLARK CG                          5.33
## 6   ANDREWS RH             23 EVANS DA                          4.48
## 7   NASCETTI G             23 DEDET JP                          4.46
## 8   MILES MA               20 MATTIUCCI S                       4.32
## 9   BARNABE C              19 PANIAGUA E                        4.25
## 10  ROMANHA AJ             19 VILAS R                           4.00
## 11  GRAMICCIA M            18 MILES MA                          3.98
## 12  DUJARDIN JP            17 NASCETTI G                        3.93
## 13  EVANS DA               17 CHILTON NB                        3.69
## 14  BRENIERE SF            15 GOKA K                            3.67
## 15  GATTI S                15 ROMANHA AJ                        3.66
## 16  SCAGLIA M              15 GRAMICCIA M                       3.62
## 17  CHIARI E               14 BARNABE C                         3.60
## 18  PANIAGUA E             14 PETRI WA                          3.48
## 19  CHILTON NB             13 AYALA FJ                          3.43
## 20  VILAS R                13 BEVERIDGE I                       3.28
## 21  POZIO E                12 BLAIR D                           3.17
## 22  TAKEUCHI T             12 EBERT F                           3.00
## 23  ALOTHMAN Z             11 FOLEY DH                          3.00
## 24  CUPOLILLO E            11 TAKAFUJI A                        3.00
## 25  GRIMALDI G             11 DUJARDIN JP                       2.91
## 26  KOBAYASHI S            11 MONIS PT                          2.87
## 27  KREUTZER RD            11 NAVAJAS M                         2.87
## 28  SNABEL V               11 BRENIERE SF                       2.76
## 29  BOSSENO MF             10 VERDYCK P                         2.70
## 30  CAPASSO C              10 MIRELMAN D                        2.62
## 31  MAYRHOFER G            10 SANMARTIN ML                      2.58
## 32  RIOUX JA               10 CHIARI E                          2.46
## 33  TAIT A                 10 POZIO E                           2.44
## 34  BEVERIDGE I             9 GATTI S                           2.42
## 35  BRUNO A                 9 SCAGLIA M                         2.42
## 36  CLARK CG                9 SCARPASSA VM                      2.42
## 37  DEL PRETE S             9 STEVENS JR                        2.35
## 38  DUJARDIN JC             9 KREUTZER RD                       2.35
## 39  GRADONI L               9 NADLER SA                         2.33
## 40  HAQUE R                 9 HAQUE R                           2.30
## 41  MICHELS PAM             9 SNABEL V                          2.28
## 42  OSMAN SM                9 JACKSON TFHG                      2.28
## 43  SCOZZAFAVA A            9 LYMBERY AJ                        2.25
## 44  SOLARI A                9 MULVEY M                          2.25
## 45  TACHIBANA H             9 MAYRHOFER G                       2.22
## 46  VARGAS F                9 CABARET J                         2.19
## 47  VULLO D                 9 MOMEN H                           2.18
## 48  YANAGI T                9 GIBSON W                          2.17
## 49  AGATSUMA T              8 GRADONI L                         2.15
## 50  ALVAR J                 8 TAKEUCHI T                        2.09
## 51  AYALA FJ                8 AGATSUMA T                        2.03
## 52  CABARET J               8 DIAMOND LS                        2.03
## 53  CEVINI C                8 EBERT D                           2.03
## 54  CIPRIANI P              8 MCMANUS DP                        2.03
## 55  FOLEY DH                8 OROZCO E                          2.03
## 56  GOKA K                  8 KOBAYASHI S                       2.02
## 57  HASHIGUCHI Y            8 SCOZZAFAVA A                      2.02
## 58  HATAM GR                8 DUFFY JE                          2.00
## 59  MOMEN H                 8 RANNALA BH                        2.00
## 60  MONIS PT                8 TAIT A                            1.94
## 61  NAVAJAS M               8 GODFREY DG                        1.92
## 62  PAOLETTI M              8 BRYAN JH                          1.92
## 63  RENAUD F                8 ARRIVILLAGA J                     1.89
## 64  SANMARTIN ML            8 OPPERDOES FR                      1.87
## 65  ANDRADE SG              7 SARGEAUNT PG                      1.87
## 66  BANULS AL               7 DESSER SS                         1.83
## 67  DARDE ML                7 LITTLE TJ                         1.83
## 68  DEREURE J               7 SOLARI A                          1.80
## 69  EY PL                   7 MONTEIRO FA                       1.78
## 70  FERNANDES O             7 YAMAZAKI Y                        1.75
## 71  FIGUEIREDO FB           7 BOSSENO MF                        1.72
## 72  GUHL F                  7 ANDRADE SG                        1.72
## 73  LANOTTE G               7 VRIJENHOEK RC                     1.70
## 74  OPPERDOES FR            7 WILLIAMS JE                       1.70
## 75  PASTEUR N               7 GRIMALDI G                        1.70
## 76  PESSON B                7 STOUTHAMER R                      1.67
## 77  PETRI WA                7 CUPOLILLO E                       1.66
## 78  RAVEL C                 7 HATAM GR                          1.64
## 79  STEVENS JR              7 DARDE ML                          1.62
## 80  TAKAFUJI A              7 PASTEUR N                         1.62
## 81  AREVALO J               6 PHILLIPS CB                       1.59
## 82  BLAIR D                 6 EDWARDS DD                        1.53
## 83  CARTA F                 6 D'AMELIO S                        1.51
## 84  GODFREY DG              6 BANDONI SM                        1.50
## 85  GOMEZ EA                6 BURDON JJ                         1.50
## 86  JACKSON TFHG            6 CLAY K                            1.50
## 87  MADEIRA MF              6 CROFT BA                          1.50
## 88  MIMORI T                6 DE SOUSA MA                       1.50
## 89  MIRELMAN D              6 THOMPSON RCA                      1.50
## 90  MONTEIRO FA             6 EY PL                             1.49
## 91  MOTAZEDIAN MH           6 GIBSON WC                         1.48
## 92  MULVEY M                6 MICHELS PAM                       1.48
## 93  MURTA SMF               6 TOMAVO S                          1.48
## 94  READY PD                6 TACHIBANA H                       1.45
## 95  RODRIGUEZ-PAEZ L        6 ILINE II                          1.45
## 96  SARGEAUNT PG            6 LUN ZR                            1.45
## 97  SCHOFIELD CJ            6 TRUC P                            1.44
## 98  SCHONIAN G              6 PESSON B                          1.44
## 99  SITHITHAWORN P          6 BLANCO A                          1.43
## 100 STEINDEL M              6 MATHIEUDAUDE F                    1.43
## 
## 
## Top manuscripts per citations
## 
##                                                   Paper           TC TCperYear
## 1   FORSTERMANN U, 1994, HYPERTENSION                            815     33.96
## 2   LINHART YB, 1996, ANNU REV ECOL SYST                         798     36.27
## 3   ZINGALES B, 2009, MEM INST OSWALDO CRUZ                      505     56.11
## 4   RIOUX JA, 1990, ANNALES DE PARASITOLOGIE HUMAINE ET COMPAREE 425     15.18
## 5   SUPURAN CT, 2010, BIOORG MED CHEM LETT                       414     51.75
## 6   DIAMOND LS, 1993, J EUKARYOT MICROBIOL                       381     15.24
## 7   SUPURAN CT, 2007, BIOORG MED CHEM                            358     32.55
## 8   ARNAUD-HAOND S, 2007, MOL ECOL                               336     30.55
## 9   TIBAYRENC M, 1991, PROC NATL ACAD SCI U S A                  325     12.04
## 10  SARGEAUNT PG, 1978, TRANS ROY SOC TROP MED HYG-a             298      7.45
## 11  SUPURAN CT, 2008, CURR PHARM DESIGN                          276     27.60
## 12  MAGILL AJ, 1993, N ENGL J MED                                262     10.48
## 13  TIBAYRENC M, 1988, EVOLUTION                                 261      8.70
## 14  MILES MA, 1977, TRANS ROY SOC TROP MED HYG                   256      6.24
## 15  COLLINS FH, 1996, INSECT MOL BIOL                            254     11.55
## 16  MACHADO CA, 2001, PROC NATL ACAD SCI U S A                   247     14.53
## 17  HAMPL V, 2001, INT J SYST EVOL MICROBIOL                     237     13.94
## 18  BARRY M, 1997, CLIN PHARMACOKINET                            236     11.24
## 19  MONIS PT, 1999, MOL BIOL EVOL                                197     10.37
## 20  MATTIUCCI S, 1997, J PARASITOL                               197      9.38
## 21  DARDE ML, 1992, J PARASITOL                                  197      7.58
## 22  MACEDO AM, 2004, MEM INST OSWALDO CRUZ                       194     13.86
## 23  MONIS PT, 2003, INFECT GENET EVOL                            189     12.60
## 24  MOLLER AP, 1998, BEHAV ECOL SOCIOBIOL                        188      9.40
## 25  BLACK WC, 1992, BULL ENTOMOL RES                             187      7.19
## 26  PATEL MS, 2006, BIOCHEM SOC TRANS                            181     15.08
## 27  KREUTZER RD, 1980, AM J TROP MED HYG                         176      4.63
## 28  CLARK CG, 1991, MOL BIOCHEM PARASITOL                        171      6.33
## 29  HAQUE R, 1998, J CLIN MICROBIOL                              166      8.30
## 30  SOLTIS DE, 1991, AM J BOT-a                                  166      6.15
## 31  DYBDAHL MF, 1996, EVOLUTION                                  165      7.50
## 32  ZARLENGA DS, 1999, INT J PARASIT                             162      8.53
## 33  GOODWIN SB, 1995, PLANT DIS                                  162      7.04
## 34  CUPOLILLO E, 1995, MOL BIOCHEM PARASITOL                     162      7.04
## 35  BLUM J, 2004, J ANTIMICROB CHEMOTHER                         158     11.29
## 36  STEVENS JR, 1999, PARASITOLOGY                               158      8.32
## 37  LIM K, 1994, PROTEIN SCI                                     158      6.58
## 38  MUNDERLOH UG, 1994, J PARASITOL                              157      6.54
## 39  HOMAN WL, 2001, INT J PARASIT                                156      9.18
## 40  LOXDALE HD, 1998, BULL ENTOMOL RES                           154      7.70
## 41  BOWLES J, 1993, ACTA TROP                                    153      6.12
## 42  SCHWARZ D, 2005, NATURE                                      151     11.62
## 43  TIBAYRENC M, 1998, INT J PARASIT                             150      7.50
## 44  BURDON JJ, 1993, ANNU REV PHYTOPATHOL                        140      5.60
## 45  LEHMANN T, 1996, HEREDITY                                    138      6.27
## 46  EBERT D, 1998, PROC R SOC B-BIOL SCI                         137      6.85
## 47  LEHMANN T, 1998, MOL BIOL EVOL                               134      6.70
## 48  BARNABE C, 2000, PARASITOLOGY                                133      7.39
## 49  SCOZZAFAVA A, 2006, EXPERT OPIN THER PATENTS                 130     10.83
## 50  MATTIUCCI S, 2006, PARASITE-J SOC FR PARASITOL               130     10.83
## 51  LESSA EP, 1998, MOL PHYLOGENET EVOL                          126      6.30
## 52  THOMAS Y, 2003, EVOLUTION                                    124      8.27
## 53  ZINGALES B, 1998, INT J PARASIT                              123      6.15
## 54  MURTA SMF, 1998, MOL BIOCHEM PARASITOL                       122      6.10
## 55  REVOLLO S, 1998, EXP PARASITOL                               120      6.00
## 56  HAQUE R, 1995, J CLIN MICROBIOL                              120      5.22
## 57  ARNAUD-HAOND S, 2005, J HERED                                115      8.85
## 58  CAMERON P, 2004, J IMMUNOL                                   115      8.21
## 59  BAYMAN P, 1991, CAN J BOT -REV CAN BOT                       112      4.15
## 60  SMITH MA, 2008, MOL ECOL RESOUR                              111     11.10
## 61  NEFF BD, 2001, EVOLUTION                                     111      6.53
## 62  ZIJLSTRA C, 1995, PHYTOPATHOLOGY                             111      4.83
## 63  KREUTZER RD, 1983, AM J TROP MED HYG                         111      3.17
## 64  SUPURAN CT, 2007, CURR TOP MED CHEM                          110     10.00
## 65  TANNICH E, 1991, J CLIN MICROBIOL                            110      4.07
## 66  JACOBSON RL, 2003, J INFECT DIS                              109      7.27
## 67  ROSENTHAL E, 1995, TRANS ROY SOC TROP MED HYG                109      4.74
## 68  SAMUELSON J, 1991, J EXP MED                                 109      4.04
## 69  MAURICIO IL, 2006, INT J PARASIT                             108      9.00
## 70  EY PL, 1997, J EUKARYOT MICROBIOL                            108      5.14
## 71  NEVO E, 1998, GENET RESOUR CROP EVOL                         107      5.35
## 72  BESANSKY NJ, 1997, GENETICS                                  107      5.10
## 73  ANDERSON TJC, 1993, PARASITOLOGY                             107      4.28
## 74  MIRELMAN D, 1986, INFECT IMMUN                               104      3.25
## 75  SARGEAUNT PG, 1978, TRANS ROY SOC TROP MED HYG               104      2.60
## 76  ANTINORI S, 2007, CLIN INFECT DIS                            103      9.36
## 77  JARNE P, 1993, BIOL J LINNEAN SOC                            103      4.12
## 78  ACUNASOTO R, 1993, AM J TROP MED HYG                         103      4.12
## 79  VALENTINI A, 2006, J PARASITOL                               102      8.50
## 80  NAVAJAS M, 2000, EXP APPL ACAROL-a                           102      5.67
## 81  CARRASCO HJ, 1996, AM J TROP MED HYG                         101      4.59
## 82  NADLER SA, 2011, PARASITOLOGY                                 99     14.14
## 83  SCHWENKENBECHER JM, 2006, INT J PARASIT                       98      8.17
## 84  SUPURAN CT, 2010, CURR PHARM DESIGN                           97     12.12
## 85  WURGLER FE, 1992, MUTAGENESIS                                 96      3.69
## 86  MACLEOD A, 2000, PROC NATL ACAD SCI U S A                     95      5.28
## 87  MARTIN FN, 2000, MYCOLOGIA                                    95      5.28
## 88  BENNETT JW, 2013, N ENGL J MED                                94     18.80
## 89  TOLEDO MJD, 2003, ANTIMICROB AGENTS CHEMOTHER                 94      6.27
## 90  LEE JY, 1998, BIOCHEMISTRY                                    93      4.65
## 91  GASKIN AA, 2002, J VET INTERN MED                             91      5.69
## 92  OUDEMANS P, 1991, MYCOL RES                                   91      3.37
## 93  MIRELMAN D, 1986, EXP PARASITOL                               91      2.84
## 94  KUHLS K, 2005, MICROBES INFECT                                90      6.92
## 95  PRATLONG F, 2004, J CLIN MICROBIOL                            89      6.36
## 96  JERONIMO SMB, 1994, TRANS ROY SOC TROP MED HYG                89      3.71
## 97  MATTIUCCI S, 2002, SYST PARASITOL                             88      5.50
## 98  GRACE JM, 1998, DRUG METAB DISPOS                             88      4.40
## 99  MONIS PT, 1998, PARASITOLOGY                                  88      4.40
## 100 ANDERSON TJC, 1997, PARASITOLOGY                              88      4.19
## 
## 
## Most Productive Countries (of corresponding authors)
## 
##         Country   Articles     Freq SCP MCP MCP_Ratio
## 1  USA                 189 0.130525 134  55    0.2910
## 2  BRAZIL              137 0.094613 100  37    0.2701
## 3  UNITED KINGDOM      134 0.092541  76  58    0.4328
## 4  FRANCE              110 0.075967  69  41    0.3727
## 5  ITALY               100 0.069061  56  44    0.4400
## 6  JAPAN                67 0.046271  43  24    0.3582
## 7  AUSTRALIA            66 0.045580  53  13    0.1970
## 8  SPAIN                60 0.041436  34  26    0.4333
## 9  GERMANY              39 0.026934  24  15    0.3846
## 10 MEXICO               31 0.021409  24   7    0.2258
## 11 CANADA               30 0.020718  23   7    0.2333
## 12 INDIA                27 0.018646  22   5    0.1852
## 13 BELGIUM              26 0.017956  11  15    0.5769
## 14 SWITZERLAND          24 0.016575  14  10    0.4167
## 15 ARGENTINA            22 0.015193  16   6    0.2727
## 16 CHINA                19 0.013122  16   3    0.1579
## 17 COLOMBIA             19 0.013122  14   5    0.2632
## 18 KENYA                18 0.012431  11   7    0.3889
## 19 VENEZUELA            18 0.012431  11   7    0.3889
## 20 IRAN                 17 0.011740  13   4    0.2353
## 21 THAILAND             17 0.011740   4  13    0.7647
## 22 BOLIVIA              14 0.009669   3  11    0.7857
## 23 EGYPT                14 0.009669  11   3    0.2143
## 24 NEW ZEALAND          13 0.008978  12   1    0.0769
## 25 ISRAEL               12 0.008287   6   6    0.5000
## 26 NETHERLANDS          12 0.008287   3   9    0.7500
## 27 CZECH REPUBLIC       11 0.007597   6   5    0.4545
## 28 AUSTRIA              10 0.006906   8   2    0.2000
## 29 CHILE                10 0.006906   5   5    0.5000
## 30 SLOVAKIA             10 0.006906   0  10    1.0000
## 31 TURKEY               10 0.006906   8   2    0.2000
## 32 PORTUGAL              8 0.005525   1   7    0.8750
## 33 SOUTH AFRICA          8 0.005525   6   2    0.2500
## 34 TUNISIA               8 0.005525   1   7    0.8750
## 35 POLAND                7 0.004834   5   2    0.2857
## 36 SWEDEN                7 0.004834   5   2    0.2857
## 37 CAMEROON              6 0.004144   2   4    0.6667
## 38 DENMARK               6 0.004144   3   3    0.5000
## 39 ETHIOPIA              6 0.004144   1   5    0.8333
## 40 MOROCCO               6 0.004144   3   3    0.5000
## 41 FINLAND               5 0.003453   2   3    0.6000
## 42 IRAQ                  5 0.003453   2   3    0.6000
## 43 SUDAN                 5 0.003453   0   5    1.0000
## 44 GEORGIA               4 0.002762   2   2    0.5000
## 45 HUNGARY               4 0.002762   4   0    0.0000
## 46 KOREA                 4 0.002762   3   1    0.2500
## 47 MALAYSIA              4 0.002762   3   1    0.2500
## 48 PANAMA                4 0.002762   3   1    0.2500
## 49 RUSSIA                4 0.002762   3   1    0.2500
## 50 TAIWAN                4 0.002762   0   4    1.0000
## 51 UGANDA                4 0.002762   2   2    0.5000
## 52 ALGERIA               3 0.002072   1   2    0.6667
## 53 PARAGUAY              3 0.002072   1   2    0.6667
## 54 SERBIA                3 0.002072   2   1    0.3333
## 55 URUGUAY               3 0.002072   2   1    0.3333
## 56 ZIMBABWE              3 0.002072   0   3    1.0000
## 57 BANGLADESH            2 0.001381   1   1    0.5000
## 58 BULGARIA              2 0.001381   2   0    0.0000
## 59 COSTA RICA            2 0.001381   2   0    0.0000
## 60 CROATIA               2 0.001381   0   2    1.0000
## 61 ECUADOR               2 0.001381   1   1    0.5000
## 62 GREECE                2 0.001381   0   2    1.0000
## 63 IRELAND               2 0.001381   1   1    0.5000
## 64 LEBANON               2 0.001381   2   0    0.0000
## 65 MALTA                 2 0.001381   1   1    0.5000
## 66 PERU                  2 0.001381   1   1    0.5000
## 67 ROMANIA               2 0.001381   2   0    0.0000
## 68 SAUDI ARABIA          2 0.001381   0   2    1.0000
## 69 SRI LANKA             2 0.001381   0   2    1.0000
## 70 YEMEN                 2 0.001381   0   2    1.0000
## 71 BAHAMAS               1 0.000691   0   1    1.0000
## 72 BAHRAIN               1 0.000691   1   0    0.0000
## 73 BURKINA FASO          1 0.000691   0   1    1.0000
## 74 ESTONIA               1 0.000691   1   0    0.0000
## 75 GUATEMALA             1 0.000691   1   0    0.0000
## 76 MAURITANIA            1 0.000691   1   0    0.0000
## 77 NIGERIA               1 0.000691   1   0    0.0000
## 78 PAKISTAN              1 0.000691   0   1    1.0000
## 79 SLOVENIA              1 0.000691   1   0    0.0000
## 80 ZAMBIA                1 0.000691   0   1    1.0000
## 
## 
## SCP: Single Country Publications
## 
## MCP: Multiple Country Publications
## 
## 
## Total Citations per Country
## 
##      Country      Total Citations Average Article Citations
## 1  USA                       8362                     44.24
## 2  UNITED KINGDOM            5177                     38.63
## 3  FRANCE                    4789                     43.54
## 4  ITALY                     4070                     40.70
## 5  BRAZIL                    3603                     26.30
## 6  AUSTRALIA                 2204                     33.39
## 7  GERMANY                   1863                     47.77
## 8  JAPAN                     1224                     18.27
## 9  SPAIN                      927                     15.45
## 10 CANADA                     868                     28.93
## 11 SWITZERLAND                844                     35.17
## 12 ISRAEL                     579                     48.25
## 13 BELGIUM                    563                     21.65
## 14 PORTUGAL                   532                     66.50
## 15 NETHERLANDS                483                     40.25
## 16 BOLIVIA                    471                     33.64
## 17 CZECH REPUBLIC             463                     42.09
## 18 MEXICO                     438                     14.13
## 19 KENYA                      359                     19.94
## 20 ARGENTINA                  343                     15.59
## 21 THAILAND                   341                     20.06
## 22 COLOMBIA                   290                     15.26
## 23 SWEDEN                     278                     39.71
## 24 VENEZUELA                  262                     14.56
## 25 INDIA                      259                      9.59
## 26 CHILE                      213                     21.30
## 27 PANAMA                     206                     51.50
## 28 NEW ZEALAND                204                     15.69
## 29 IRAN                       200                     11.76
## 30 CHINA                      182                      9.58
## 31 DENMARK                    156                     26.00
## 32 URUGUAY                    155                     51.67
## 33 SOUTH AFRICA               152                     19.00
## 34 AUSTRIA                    147                     14.70
## 35 SLOVAKIA                   129                     12.90
## 36 TURKEY                     124                     12.40
## 37 ETHIOPIA                   121                     20.17
## 38 TUNISIA                    105                     13.12
## 39 FINLAND                    101                     20.20
## 40 SRI LANKA                   92                     46.00
## 41 IRAQ                        90                     18.00
## 42 ECUADOR                     82                     41.00
## 43 BANGLADESH                  77                     38.50
## 44 MALAYSIA                    76                     19.00
## 45 EGYPT                       73                      5.21
## 46 SUDAN                       72                     14.40
## 47 MOROCCO                     71                     11.83
## 48 UGANDA                      65                     16.25
## 49 GEORGIA                     59                     14.75
## 50 POLAND                      58                      8.29
## 51 ALGERIA                     52                     17.33
## 52 KOREA                       52                     13.00
## 53 CAMEROON                    48                      8.00
## 54 PERU                        48                     24.00
## 55 ZIMBABWE                    47                     15.67
## 56 GREECE                      42                     21.00
## 57 TAIWAN                      41                     10.25
## 58 BURKINA FASO                36                     36.00
## 59 MALTA                       34                     17.00
## 60 IRELAND                     31                     15.50
## 61 HUNGARY                     23                      5.75
## 62 LEBANON                     22                     11.00
## 63 SAUDI ARABIA                17                      8.50
## 64 CROATIA                     14                      7.00
## 65 SERBIA                      13                      4.33
## 66 PAKISTAN                    12                     12.00
## 67 PARAGUAY                    12                      4.00
## 68 ROMANIA                     12                      6.00
## 69 ZAMBIA                      12                     12.00
## 70 BAHRAIN                      9                      9.00
## 71 COSTA RICA                   9                      4.50
## 72 GUATEMALA                    9                      9.00
## 73 YEMEN                        9                      4.50
## 74 BAHAMAS                      8                      8.00
## 75 ESTONIA                      5                      5.00
## 76 RUSSIA                       4                      1.00
## 77 SLOVENIA                     2                      2.00
## 78 BULGARIA                     1                      0.50
## 79 MAURITANIA                   1                      1.00
## 80 NIGERIA                      1                      1.00
## 
## 
## Most Relevant Sources
## 
##                                                                             Sources        Articles
## 1   TRANSACTIONS OF THE ROYAL SOCIETY OF TROPICAL MEDICINE AND HYGIENE                           72
## 2   AMERICAN JOURNAL OF TROPICAL MEDICINE AND HYGIENE                                            70
## 3   PARASITOLOGY                                                                                 62
## 4   INTERNATIONAL JOURNAL FOR PARASITOLOGY                                                       55
## 5   MEMORIAS DO INSTITUTO OSWALDO CRUZ                                                           49
## 6   JOURNAL OF PARASITOLOGY                                                                      47
## 7   PARASITOLOGY RESEARCH                                                                        46
## 8   ACTA TROPICA                                                                                 42
## 9   MOLECULAR AND BIOCHEMICAL PARASITOLOGY                                                       37
## 10  EXPERIMENTAL PARASITOLOGY                                                                    34
## 11  JOURNAL OF MEDICAL ENTOMOLOGY                                                                24
## 12  ANNALS OF TROPICAL MEDICINE AND PARASITOLOGY                                                 22
## 13  INFECTION GENETICS AND EVOLUTION                                                             16
## 14  JOURNAL OF CLINICAL MICROBIOLOGY                                                             16
## 15  VETERINARY PARASITOLOGY                                                                      16
## 16  MEDICAL AND VETERINARY ENTOMOLOGY                                                            15
## 17  SYSTEMATIC PARASITOLOGY                                                                      15
## 18  EVOLUTION                                                                                    14
## 19  PARASITE-JOURNAL DE LA SOCIETE FRANCAISE DE PARASITOLOGIE                                    14
## 20  HEREDITY                                                                                     13
## 21  BIOORGANIC \\& MEDICINAL CHEMISTRY                                                           12
## 22  JOURNAL OF EUKARYOTIC MICROBIOLOGY                                                           11
## 23  MOLECULAR ECOLOGY                                                                            10
## 24  ANNALS OF THE ENTOMOLOGICAL SOCIETY OF AMERICA                                                9
## 25  APPLIED ENTOMOLOGY AND ZOOLOGY                                                                9
## 26  BIOLOGICAL JOURNAL OF THE LINNEAN SOCIETY                                                     9
## 27  EXPERIMENTAL AND APPLIED ACAROLOGY                                                            9
## 28  JOURNAL OF ENZYME INHIBITION AND MEDICINAL CHEMISTRY                                          9
## 29  JOURNAL OF PROTOZOOLOGY                                                                       9
## 30  PLOS ONE                                                                                      9
## 31  TROPICAL MEDICINE \\& INTERNATIONAL HEALTH                                                    9
## 32  BULLETIN OF ENTOMOLOGICAL RESEARCH                                                            8
## 33  COMPARATIVE BIOCHEMISTRY AND PHYSIOLOGY B-BIOCHEMISTRY \\& MOLECULAR   BIOLOGY                8
## 34  EXPERIMENTAL \\& APPLIED ACAROLOGY                                                            8
## 35  HELMINTHOLOGIA                                                                                8
## 36  PHYTOPATHOLOGY                                                                                8
## 37  PLANT DISEASE                                                                                 8
## 38  REVISTA DA SOCIEDADE BRASILEIRA DE MEDICINA TROPICAL                                          8
## 39  BIOORGANIC \\& MEDICINAL CHEMISTRY LETTERS                                                    7
## 40  CLINICAL INFECTIOUS DISEASES                                                                  7
## 41  JOURNAL OF INFECTIOUS DISEASES                                                                7
## 42  JOURNAL OF NEMATOLOGY                                                                         7
## 43  MALACOLOGIA                                                                                   7
## 44  JOURNAL OF HELMINTHOLOGY                                                                      6
## 45  NEMATOLOGY                                                                                    6
## 46  PARASITES \\& VECTORS                                                                         6
## 47  PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF   AMERICA             6
## 48  AMERICAN JOURNAL OF BOTANY                                                                    5
## 49  ANNALES DE PARASITOLOGIE HUMAINE ET COMPAREE                                                  5
## 50  BIOCHEMICAL SYSTEMATICS AND ECOLOGY                                                           5
## 51  EUROPEAN JOURNAL OF BIOCHEMISTRY                                                              5
## 52  FISHERIES RESEARCH                                                                            5
## 53  ICOPA IX - 9TH INTERNATIONAL CONGRESS OF PARASITOLOGY                                         5
## 54  JOURNAL OF EVOLUTIONARY BIOLOGY                                                               5
## 55  JOURNAL OF HEREDITY                                                                           5
## 56  JOURNAL OF THE AMERICAN MOSQUITO CONTROL ASSOCIATION                                          5
## 57  MYCOLOGICAL RESEARCH                                                                          5
## 58  PARASITE                                                                                      5
## 59  PARASITOLOGY INTERNATIONAL                                                                    5
## 60  ARCHIVES OF MEDICAL RESEARCH                                                                  4
## 61  BIOCHEMICAL GENETICS                                                                          4
## 62  BIOCHEMISTRY                                                                                  4
## 63  BMC INFECTIOUS DISEASES                                                                       4
## 64  CANADIAN JOURNAL OF ZOOLOGY-REVUE CANADIENNE DE ZOOLOGIE                                      4
## 65  FEMS MICROBIOLOGY LETTERS                                                                     4
## 66  GENETICA                                                                                      4
## 67  INFECTION AND IMMUNITY                                                                        4
## 68  INSECT BIOCHEMISTRY AND MOLECULAR BIOLOGY                                                     4
## 69  INTERNATIONAL JOURNAL OF DERMATOLOGY                                                          4
## 70  JOURNAL OF BIOLOGICAL CHEMISTRY                                                               4
## 71  JOURNAL OF VECTOR ECOLOGY                                                                     4
## 72  MOLECULAR BIOLOGY AND EVOLUTION                                                               4
## 73  PARASITOLOGY TODAY                                                                            4
## 74  PLANT PATHOLOGY                                                                               4
## 75  PLOS NEGLECTED TROPICAL DISEASES                                                              4
## 76  PROCEEDINGS OF THE ROYAL SOCIETY B-BIOLOGICAL SCIENCES                                        4
## 77  TROPENMEDIZIN UND PARASITOLOGIE                                                               4
## 78  TROPICAL MEDICINE AND PARASITOLOGY                                                            4
## 79  ACTA PARASITOLOGICA                                                                           3
## 80  ANTIMICROBIAL AGENTS AND CHEMOTHERAPY                                                         3
## 81  BEHAVIORAL ECOLOGY AND SOCIOBIOLOGY                                                           3
## 82  BIOCHEMICAL AND BIOPHYSICAL RESEARCH COMMUNICATIONS                                           3
## 83  BIOCHEMICAL PHARMACOLOGY                                                                      3
## 84  BIOLOGICAL CONTROL                                                                            3
## 85  BULLETIN DE LA SOCIETE DE PATHOLOGIE EXOTIQUE                                                 3
## 86  CANADIAN JOURNAL OF BOTANY-REVUE CANADIENNE DE BOTANIQUE                                      3
## 87  CHINESE MEDICAL JOURNAL                                                                       3
## 88  COMPTES RENDUS DE L ACADEMIE DES SCIENCES SERIE III-SCIENCES DE LA   VIE-LIFE SCIENCES        3
## 89  ENTOMOLOGIA EXPERIMENTALIS ET APPLICATA                                                       3
## 90  EUROPEAN JOURNAL OF ENTOMOLOGY                                                                3
## 91  GENETICS                                                                                      3
## 92  INDIAN JOURNAL OF MEDICAL RESEARCH                                                            3
## 93  INDIAN JOURNAL OF MEDICAL RESEARCH SECTION A-INFECTIOUS DISEASES                              3
## 94  INFECTIOUS AGENTS AND DISEASE-REVIEWS ISSUES AND COMMENTARY                                   3
## 95  INSECT SCIENCE AND ITS APPLICATION                                                            3
## 96  INSECTES SOCIAUX                                                                              3
## 97  INTERNATIONAL JOURNAL OF FOOD MICROBIOLOGY                                                    3
## 98  INVESTIGACION CLINICA                                                                         3
## 99  JAPANESE JOURNAL OF APPLIED ENTOMOLOGY AND ZOOLOGY                                            3
## 100 JOURNAL OF FISH BIOLOGY                                                                       3
## 
## 
## Most Relevant Keywords
## 
##             Author Keywords (DE)      Articles     Keywords-Plus (ID)     Articles
## 1   ALLOZYMES                               83 IDENTIFICATION                  179
## 2   TRYPANOSOMA CRUZI                       75 DIFFERENTIATION                 110
## 3   ISOENZYMES                              53 POPULATIONS                     103
## 4   ISOENZYME                               52 DNA                              87
## 5   ELECTROPHORESIS                         42 CHAGAS-DISEASE                   78
## 6   POPULATION GENETICS                     40 STRAINS                          78
## 7   LEISHMANIA                              36 EVOLUTION                        72
## 8   TAXONOMY                                36 INFECTION                        68
## 9   PHYLOGENY                               32 VARIABILITY                      67
## 10  LEISHMANIASIS                           31 CUTANEOUS LEISHMANIASIS          66
## 11  LEISHMANIA INFANTUM                     30 ELECTROPHORESIS                  60
## 12  ALLOZYME                                29 ISOENZYME PATTERNS               55
## 13  CARBONIC ANHYDRASE                      28 BRAZIL                           49
## 14  EPIDEMIOLOGY                            26 VISCERAL LEISHMANIASIS           47
## 15  CHAGAS DISEASE                          25 ISOENZYME                        46
## 16  PCR                                     25 PARASITES                        46
## 17  ISOZYME                                 23 EXPRESSION                       44
## 18  ISOZYMES                                23 NATURAL-POPULATIONS              44
## 19  GENE FLOW                               22 ZYMODEMES                        43
## 20  GENETIC VARIATION                       22 DIVERSITY                        42
## 21  ZYMODEME                                22 POPULATION                       38
## 22  DIAGNOSIS                               21 DIAGNOSIS                        37
## 23  GENETIC DIVERSITY                       21 DIPTERA                          37
## 24  RAPD                                    21 COMPLEX                          36
## 25  ENTAMOEBA HISTOLYTICA                   20 RESISTANCE                       36
## 26  MALARIA                                 20 POPULATION-STRUCTURE             35
## 27  BRAZIL                                  19 AGENT                            34
## 28  CUTANEOUS LEISHMANIASIS                 19 ALLOZYME                         34
## 29  SYSTEMATICS                             19 PLASMODIUM-FALCIPARUM            34
## 30  ALLOZYME ELECTROPHORESIS                17 TRANSMISSION                     34
## 31  CHARACTERIZATION                        17 TRYPANOSOMA-CRUZI                34
## 32  MORPHOLOGY                              17 PATTERNS                         33
## 33  SPECIATION                              17 POLYMERASE CHAIN-REACTION        32
## 34  ISOENZYME ELECTROPHORESIS               16 STOCKS                           32
## 35  POLYMORPHISM                            16 POLYMERASE-CHAIN-REACTION        31
## 36  RESISTANCE                              16 POPULATION-GENETICS              31
## 37  ENTAMOEBA DISPAR                        15 SEQUENCES                        31
## 38  MITOCHONDRIAL DNA                       15 AMPLIFICATION                    30
## 39  GENETIC VARIABILITY                     14 MITOCHONDRIAL-DNA                30
## 40  LEISHMANIA TROPICA                      14 POLYMORPHISM                     30
## 41  MICROSATELLITES                         14 MONOCLONAL-ANTIBODIES            29
## 42  PARASITE                                14 CULICIDAE                        28
## 43  POPULATION STRUCTURE                    14 ENTAMOEBA-HISTOLYTICA            28
## 44  VISCERAL LEISHMANIASIS                  14 PURIFICATION                     28
## 45  GENETIC                                 13 SEQUENCE                         28
## 46  GENETIC STRUCTURE                       13 KINETOPLAST DNA                  27
## 47  GENETICS                                13 MARKERS                          27
## 48  LEISHMANIA DONOVANI                     13 MICE                             27
## 49  TRYPANOSOMA BRUCEI                      13 EPIDEMIOLOGY                     26
## 50  ESTERASE                                12 HOST                             26
## 51  EVOLUTION                               12 ISOENZYME CHARACTERIZATION       26
## 52  POPULATION                              12 LEISHMANIASIS                    26
## 53  PROTOZOA                                12 PARASITIC PROTOZOA               26
## 54  SULFONAMIDE                             12 SYSTEMATICS                      26
## 55  ZYMODEMES                               12 CLONES                           25
## 56  AMEBIASIS                               11 OLD-WORLD                        25
## 57  CHAGAS' DISEASE                         11 VIRULENCE                        25
## 58  LEISHMANIA MAJOR                        11 DISTANCE                         24
## 59  TETRANYCHUS URTICAE                     11 AMEBIASIS                        23
## 60  HYBRIDIZATION                           10 BRUCEI                           23
## 61  IRAN                                    10 ESCHERICHIA-COLI                 23
## 62  ISOENZYME ANALYSIS                      10 GENETIC-VARIATION                23
## 63  MOLECULAR                               10 PARASITE                         23
## 64  MORPHOMETRICS                           10 LEISHMANIA                       22
## 65  NEMATODE                                10 PSYCHODIDAE                      22
## 66  POLYMERASE CHAIN REACTION               10 RIBOSOMAL DNA                    22
## 67  SIBLING SPECIES                         10 AMPLIFIED POLYMORPHIC DNA        21
## 68  TRYPANOSOMA-BRUCEI                      10 DISEASE                          21
## 69  TRYPANOSOMA-CRUZI                       10 GENE                             21
## 70  COLOMBIA                                 9 ISOZYME-II                       21
## 71  DNA                                      9 METABOLISM                       21
## 72  HETEROZYGOSITY                           9 PCR                              21
## 73  IDENTIFICATION                           9 PROTEIN                          21
## 74  INHIBITOR                                9 SPECIATION                       21
## 75  MULTILOCUS ENZYME ELECTROPHORESIS        9 CLASSIFICATION                   20
## 76  RFLP                                     9 DONOVANI                         20
## 77  CHAGAS                                   8 INFANTUM                         20
## 78  CRYPTIC SPECIES                          8 INVITRO                          20
## 79  ENTAMOEBA-HISTOLYTICA                    8 POLYMORPHISMS                    20
## 80  GENETIC EXCHANGE                         8 FLOW                             19
## 81  HYMENOPTERA                              8 GENES                            19
## 82  METALLOENZYMES                           8 ASCARIDOIDEA                     18
## 83  MICROSATELLITE                           8 CLONING                          18
## 84  MOLECULAR SYSTEMATICS                    8 PROTEINS                         18
## 85  PATHOGENICITY                            8 TAXONOMY                         18
## 86  SELECTION                                8 CRUZI                            17
## 87  SPECIES COMPLEX                          8 CRYSTAL-STRUCTURE                17
## 88  TRIATOMA INFESTANS                       8 ENZYMES                          17
## 89  VARIATION                                8 ISOENZYME ELECTROPHORESIS        17
## 90  BOLIVIA                                  7 SUBGENUS TRYPANOZOON             17
## 91  CHINA                                    7 ACTIVE-SITE                      16
## 92  COEVOLUTION                              7 ENZYME                           16
## 93  DISEASE                                  7 ISOENZYME ANALYSIS               16
## 94  DOG                                      7 ISOZYME                          16
## 95  DRUG RESISTANCE                          7 MOSQUITOS                        16
## 96  ETHIOPIA                                 7 RIBOSOMAL-RNA                    16
## 97  GENETIC DIFFERENTIATION                  7 SANDFLIES                        16
## 98  GENETIC DISTANCE                         7 ARBITRARY PRIMERS                15
## 99  GIARDIA                                  7 GENETIC DIFFERENTIATION          15
## 100 GLYCOLYSIS                               7 GENUS                            15

We may want to save the top 10 countries that have published for this dataset

citations_ana.sum$MostProdCountries
##         Country   Articles     Freq SCP MCP MCP_Ratio
## 1  USA                 189 0.130525 134  55    0.2910
## 2  BRAZIL              137 0.094613 100  37    0.2701
## 3  UNITED KINGDOM      134 0.092541  76  58    0.4328
## 4  FRANCE              110 0.075967  69  41    0.3727
## 5  ITALY               100 0.069061  56  44    0.4400
## 6  JAPAN                67 0.046271  43  24    0.3582
## 7  AUSTRALIA            66 0.045580  53  13    0.1970
## 8  SPAIN                60 0.041436  34  26    0.4333
## 9  GERMANY              39 0.026934  24  15    0.3846
## 10 MEXICO               31 0.021409  24   7    0.2258
## 11 CANADA               30 0.020718  23   7    0.2333
## 12 INDIA                27 0.018646  22   5    0.1852
## 13 BELGIUM              26 0.017956  11  15    0.5769
## 14 SWITZERLAND          24 0.016575  14  10    0.4167
## 15 ARGENTINA            22 0.015193  16   6    0.2727
## 16 CHINA                19 0.013122  16   3    0.1579
## 17 COLOMBIA             19 0.013122  14   5    0.2632
## 18 KENYA                18 0.012431  11   7    0.3889
## 19 VENEZUELA            18 0.012431  11   7    0.3889
## 20 IRAN                 17 0.011740  13   4    0.2353
## 21 THAILAND             17 0.011740   4  13    0.7647
## 22 BOLIVIA              14 0.009669   3  11    0.7857
## 23 EGYPT                14 0.009669  11   3    0.2143
## 24 NEW ZEALAND          13 0.008978  12   1    0.0769
## 25 ISRAEL               12 0.008287   6   6    0.5000
## 26 NETHERLANDS          12 0.008287   3   9    0.7500
## 27 CZECH REPUBLIC       11 0.007597   6   5    0.4545
## 28 AUSTRIA              10 0.006906   8   2    0.2000
## 29 CHILE                10 0.006906   5   5    0.5000
## 30 SLOVAKIA             10 0.006906   0  10    1.0000
## 31 TURKEY               10 0.006906   8   2    0.2000
## 32 PORTUGAL              8 0.005525   1   7    0.8750
## 33 SOUTH AFRICA          8 0.005525   6   2    0.2500
## 34 TUNISIA               8 0.005525   1   7    0.8750
## 35 POLAND                7 0.004834   5   2    0.2857
## 36 SWEDEN                7 0.004834   5   2    0.2857
## 37 CAMEROON              6 0.004144   2   4    0.6667
## 38 DENMARK               6 0.004144   3   3    0.5000
## 39 ETHIOPIA              6 0.004144   1   5    0.8333
## 40 MOROCCO               6 0.004144   3   3    0.5000
## 41 FINLAND               5 0.003453   2   3    0.6000
## 42 IRAQ                  5 0.003453   2   3    0.6000
## 43 SUDAN                 5 0.003453   0   5    1.0000
## 44 GEORGIA               4 0.002762   2   2    0.5000
## 45 HUNGARY               4 0.002762   4   0    0.0000
## 46 KOREA                 4 0.002762   3   1    0.2500
## 47 MALAYSIA              4 0.002762   3   1    0.2500
## 48 PANAMA                4 0.002762   3   1    0.2500
## 49 RUSSIA                4 0.002762   3   1    0.2500
## 50 TAIWAN                4 0.002762   0   4    1.0000
## 51 UGANDA                4 0.002762   2   2    0.5000
## 52 ALGERIA               3 0.002072   1   2    0.6667
## 53 PARAGUAY              3 0.002072   1   2    0.6667
## 54 SERBIA                3 0.002072   2   1    0.3333
## 55 URUGUAY               3 0.002072   2   1    0.3333
## 56 ZIMBABWE              3 0.002072   0   3    1.0000
## 57 BANGLADESH            2 0.001381   1   1    0.5000
## 58 BULGARIA              2 0.001381   2   0    0.0000
## 59 COSTA RICA            2 0.001381   2   0    0.0000
## 60 CROATIA               2 0.001381   0   2    1.0000
## 61 ECUADOR               2 0.001381   1   1    0.5000
## 62 GREECE                2 0.001381   0   2    1.0000
## 63 IRELAND               2 0.001381   1   1    0.5000
## 64 LEBANON               2 0.001381   2   0    0.0000
## 65 MALTA                 2 0.001381   1   1    0.5000
## 66 PERU                  2 0.001381   1   1    0.5000
## 67 ROMANIA               2 0.001381   2   0    0.0000
## 68 SAUDI ARABIA          2 0.001381   0   2    1.0000
## 69 SRI LANKA             2 0.001381   0   2    1.0000
## 70 YEMEN                 2 0.001381   0   2    1.0000
## 71 BAHAMAS               1 0.000691   0   1    1.0000
## 72 BAHRAIN               1 0.000691   1   0    0.0000
## 73 BURKINA FASO          1 0.000691   0   1    1.0000
## 74 ESTONIA               1 0.000691   1   0    0.0000
## 75 GUATEMALA             1 0.000691   1   0    0.0000
## 76 MAURITANIA            1 0.000691   1   0    0.0000
## 77 NIGERIA               1 0.000691   1   0    0.0000
## 78 PAKISTAN              1 0.000691   0   1    1.0000
## 79 SLOVENIA              1 0.000691   1   0    0.0000
## 80 ZAMBIA                1 0.000691   0   1    1.0000
#bar chart of top 10 countries
df_count<-data.frame(Country=as.character(citations_ana.sum$MostProdCountries$`Country  `),Article_count=as.integer(citations_ana.sum$MostProdCountries$Articles)) %>% slice(.,1:10)

ggplot(df_count, aes(Country, Article_count)) +
  geom_bar(stat = "identity",fill=brewer.pal(10, "Spectral")) +
  coord_flip() +
  theme_bw() 

#with everyone else category
vec<-as.data.frame(citations_ana$Countries,stringsAsFactors = F) %>% filter(!Tab %in% trimws(as.character(df_count$Country),which = c("both", "left", "right"))) %>% select(.,Freq) %>% sum()
vec2<-data.frame(Country='OTHER',Article_count=as.integer(vec))
df_count<-rbind(df_count,vec2)

ggplot(df_count, aes(Country, Article_count)) +
  geom_bar(stat = "identity",fill=brewer.pal(11, "Spectral")) +
  coord_flip() +
  theme_bw() 

#write it out as a table
write.table(citations_ana.sum$MostProdCountries,'TopProducingCountriesForAllozymeParasiteSearch',row.names=F,quote=F,sep='\t')

We are interested in how many papers are produced per year - we can see that in the summary file. We can also calculate the length of time it took for X number of publications

#citations_ana.sum$AnnualProduction
#to see when XX % of papers were published
table<-citations_ana.sum$AnnualProduction %>% mutate(cumsum=cumsum(Articles),cumper=cumsum(Articles)/sum(Articles)*100)
table
##    Year    Articles cumsum     cumper
## 1     1966        1      1   0.066357
## 2     1968        1      2   0.132714
## 3     1973        1      3   0.199071
## 4     1974        1      4   0.265428
## 5     1975        3      7   0.464499
## 6     1977        2      9   0.597213
## 7     1978        4     13   0.862641
## 8     1979        3     16   1.061712
## 9     1980        6     22   1.459854
## 10    1981        3     25   1.658925
## 11    1982        9     34   2.256138
## 12    1983        7     41   2.720637
## 13    1984       11     52   3.450564
## 14    1985        5     57   3.782349
## 15    1986        7     64   4.246848
## 16    1987       15     79   5.242203
## 17    1988        5     84   5.573988
## 18    1989        8     92   6.104844
## 19    1990       17    109   7.232913
## 20    1991       67    176  11.678832
## 21    1992       84    260  17.252820
## 22    1993       70    330  21.897810
## 23    1994       67    397  26.343729
## 24    1995       66    463  30.723291
## 25    1996       65    528  35.036496
## 26    1997       72    600  39.814200
## 27    1998       95    695  46.118115
## 28    1999       82    777  51.559390
## 29    2000       67    844  56.005309
## 30    2001       51    895  59.389516
## 31    2002       54    949  62.972794
## 32    2003       57   1006  66.755143
## 33    2004       60   1066  70.736563
## 34    2005       41   1107  73.457200
## 35    2006       57   1164  77.239549
## 36    2007       42   1206  80.026543
## 37    2008       44   1250  82.946251
## 38    2009       40   1290  85.600531
## 39    2010       27   1317  87.392170
## 40    2011       27   1344  89.183809
## 41    2012       19   1363  90.444592
## 42    2013       22   1385  91.904446
## 43    2014       23   1408  93.430657
## 44    2015       28   1436  95.288653
## 45    2016       29   1465  97.213006
## 46    2017       23   1488  98.739217
## 47    2018       19   1507 100.000000
write.table(table,'ProductionPerYearForAllozymeParasiteSearch',row.names=F,quote=F,sep='\t')

#basic line graph
ggplot(citations_ana.sum$AnnualProduction, aes(`Year   `,Articles, group=1)) +
  geom_line(aes(`Year   `,Articles))

ggplot(citations_ana.sum$AnnualProduction, aes(`Year   `,Articles, group=1)) +
  geom_point(aes(citations_ana.sum$AnnualProduction$`Year   `,citations_ana.sum$AnnualProduction$Articles), size = 3,colour='red') +
  geom_line(aes(`Year   `,Articles)) +
  labs(title="Allozymes",x='Year', y='Article Number', fill="Subset") +
  theme_bw() +
  theme(axis.text.x = element_text(angle = 90, hjust = 1)) 

#create some splines to smooth the curve
spline_int <- as.data.frame(spline(citations_ana.sum$AnnualProductio$`Year   `, citations_ana.sum$AnnualProduction$Articles))

#just make it look a bit prettier
ggplot(citations_ana.sum$AnnualProduction) + 
  geom_point(aes(citations_ana.sum$AnnualProduction$`Year   `,citations_ana.sum$AnnualProduction$Articles), size = 3) +
  geom_line(data = spline_int, aes(x,y)) +
  geom_area(data = spline_int, aes(x,y,fill='red'),alpha=0.6) +
  theme_bw() +
  theme(axis.text.x = element_text(angle = 90, hjust = 1)) +
  labs(title="Allozymes",x='Year', y='Article Number', fill="Subset") +
  scale_fill_manual(labels = "Parasites", values = alpha("red",.6))

We also want to do some word clouds. There is two ways to use word clouds. The first way is that we treat every word in the keyword list individually. There is two types of keywords. The first is the Author Keywords, the second is the ‘Keywords-Plus’ which is generated by WoS. Im going to use the Author Keywords but commented in is the lines for the ‘Keywords-Plus’.

For author keywords: citations_ana.sum$MostRelKeywords$`Author Keywords (DE) `

For WoS keywords: citations_ana.sum$MostRelKeywords$`Keywords-Plus (ID) `

Here we will build a new dataframe of our keywords and filter out the stuff we dont want before creating a corpus. Which is a sort of list used by text mining packages in R. Im not totally sure what is special about it - but we need it!

I have done this example of the allozyme data and have tried roughly to filter out method specific terms. If you are in the allozyme group you will need to remove this and re-do this properly as I did this very quickly and may have over or under filtered!!

#careful with this table the biblio function has built a table which has two columns named 'Articles'
head(citations_ana.sum$MostRelKeywords)
##           Author Keywords (DE)      Articles     Keywords-Plus (ID)     Articles
## 1 ALLOZYMES                               83 IDENTIFICATION                  179
## 2 TRYPANOSOMA CRUZI                       75 DIFFERENTIATION                 110
## 3 ISOENZYMES                              53 POPULATIONS                     103
## 4 ISOENZYME                               52 DNA                              87
## 5 ELECTROPHORESIS                         42 CHAGAS-DISEASE                   78
## 6 POPULATION GENETICS                     40 STRAINS                          78
colnames(citations_ana.sum$MostRelKeywords)
## [1] "Author Keywords (DE)     " "Articles"                  "Keywords-Plus (ID)    "    "Articles"
#get rid of punctuation and create data frame
forwordcloud<-as.data.frame(cbind(as.character(trimws(citations_ana.sum$MostRelKeywords$`Author Keywords (DE)     `, which = c("both", "left", "right"))),citations_ana.sum$MostRelKeywords[2]),stringsAsFactors=FALSE)
colnames(forwordcloud)<-c('keyword','count_papers')

#if we want to plot 'Keywords-Plus (ID)'
#forwordcloud<-as.data.frame(cbind(as.character(trimws(citations_ana.sum$MostRelKeywords$`Keywords-Plus (ID)    `, which = c("both", "left", "right"))),citations_ana.sum$MostRelKeywords[4]),stringsAsFactors=FALSE)

#dataframe:
head(forwordcloud)
##               keyword count_papers
## 1           ALLOZYMES           83
## 2   TRYPANOSOMA CRUZI           75
## 3          ISOENZYMES           53
## 4           ISOENZYME           52
## 5     ELECTROPHORESIS           42
## 6 POPULATION GENETICS           40
#we want to drop the keywords we searched for from our dataframe
forwordcloud<- forwordcloud %>% filter(!grepl('allozyme|electrophoresis|isoenzyme|isozyme|rapd|carbonic anhydrase|aflp|creatine kinase|protein kinase|alkaline phosphatase|cytochrome P450|glutathione S-transferase|alcohol dehydrogenase|lactate dehydrogenase|catalase|aldehyde dehydrogenase|hexokinase|peroxidase|5 alpha-reductase',keyword,ignore.case = TRUE))

#create corpus
forwordcloud.Corpus<-Corpus(VectorSource(forwordcloud[rep(row.names(forwordcloud), forwordcloud$count_papers), 1]))

#can use the function inspect to display the information on the corpus
#inspect(forwordcloud.Corpus)

#we dont have any special characters but we could remove funky characters if we have them
#these will throw a warning but dont worry they dont mean anything here-its not dropping documents its because I used a vector source for the corpus and for whatever reason that generates a warning
#forwordcloud.Corpus<- tm_map(forwordcloud.Corpus, removePunctuation)
#forwordcloud.Corpus <- tm_map(forwordcloud.Corpus, removeNumbers)
#forwordcloud.Corpus <- tm_map(forwordcloud.Corpus, stripWhitespace)
#forwordcloud.Corpus <- tm_map(forwordcloud.Corpus, remove_stopwords) #with package 'tau' 
#forwordcloud.Corpus <- tm_map(forwordcloud.Corpus,content_transformer(tolower))
#qdap package offers other cleaning functions if we need them

#create wordclouds
wordcloud(forwordcloud.Corpus,scale=c(2.0,.6),max.words=30)

#make it pretty-look up brewer.pal for colour pallets https://www.nceas.ucsb.edu/~frazier/RSpatialGuides/colorPaletteCheatsheet.pdf

wordcloud(forwordcloud.Corpus,colors=brewer.pal(8, "Dark2"),max.words=30,scale=c(2.0,.6))

You can see that ‘genetic’ and ‘genetics’ come up. You can try to use the stemming function to look for the root of the word but I found this a bit ugly and ended up doing it by hand.

#steming function
#forwordcloud.Corpus <- tm_map(forwordcloud.Corpus,content_transformer(tolower))
#forwordcloud.doc<- tm_map(forwordcloud.Corpus, stemDocument, "english")
#wordcloud(forwordcloud.doc,colors=brewer.pal(8, "Dark2"))

#code to reduce redundancy by hand
forwordcloud<-forwordcloud %>%  mutate(fixkeyword=sub("GENETICS", "GENETIC", keyword)) 

forwordcloud.Corpus<-Corpus(VectorSource(forwordcloud[rep(row.names(forwordcloud), forwordcloud$count_papers), 3]))

wordcloud(forwordcloud.Corpus,colors=brewer.pal(8, "Dark2"),max.words=30,scale=c(2.2,.6))

#you can change the percentage that are rotated with the rot.per call
wordcloud(forwordcloud.Corpus,colors=brewer.pal(8, "Dark2"),max.words=30,rot.per=0,scale=c(1.8,.8))

The second way to make a word cloud is to consider the whole phrase a word. I think this makes more sense but it may not be as nice to look at.

wordcloud(tolower(forwordcloud$keyword),as.numeric(forwordcloud$count_papers), colors="black",max.words=30,scale=c(2.0,.6))

#colours and font
wordcloud(tolower(forwordcloud$keyword),as.numeric(forwordcloud$count_papers), colors=brewer.pal(8, "Set1"),max.words=30,scale=c(2.0,.6))

wordcloud(tolower(forwordcloud$keyword),as.numeric(forwordcloud$count_papers), colors=brewer.pal(8, "Dark2"),vfont=c("script","bold"),max.words=30,rot.per=0,scale=c(2.0,.6))

wordcloud(tolower(forwordcloud$keyword),as.numeric(forwordcloud$count_papers), colors=brewer.pal(8, "Dark2"),family = "mono",font = 2,max.words=30,scale=c(2.0,.6))

Okay we have our parasite plots but really what we want is to compare these to the other search terms

This block will be if you are comparing parasite searches that you downloaded 500 at a time to the ‘publication per regions’ and ‘publication per year’ files that you downloaded from the WoS website for the general search terms (via the analyse function in WoS).

Bring in the file that you brought down from WoS and set it up with our orginal file from above:

d1v2<-read.table('../broadersearchers/AllozymePerYearBroadSearch.txt', sep="\t",header=T,row.names=NULL)
head(d1v2)
##   Publication Years records_._of_38597
## 1        2019     1              0.003
## 2        2018   432              1.119
## 3        2017   570              1.477
## 4        2016   583              1.510
## 5        2015   580              1.503
## 6        2014   676              1.751
colnames(d1v2) <- c("Year", "ArticlesGeneral","PercentArticles")
d1v2<-d1v2 %>%  arrange(.,Year) %>%  mutate(PercentPerYearGeneral=cumsum(ArticlesGeneral)/sum(ArticlesGeneral)*100) %>% select(.,-PercentArticles)
d1v2$Year <- as.character(d1v2$Year)

#have to merge them with earlier dataset
df1<-citations_ana.sum$AnnualProduction 
colnames(df1)
## [1] "Year   "  "Articles"
#this is a good example of how not to name column names-the biblio package adds a bunch of trailing white space which is super frustrating to work around
colnames(df1) <- c("Year", "ArticlesParasite")
df1<-df1 %>%  arrange(.,Year) %>%  mutate(PercentPerYearParasites=cumsum(ArticlesParasite)/sum(ArticlesParasite)*100)
df1$Year <- as.character(df1$Year)

dmerged<-full_join(d1v2,df1,by='Year',all=TRUE) 
head(dmerged)
##   Year ArticlesGeneral PercentPerYearGeneral ArticlesParasite PercentPerYearParasites
## 1 1960               1           0.002590875               NA                      NA
## 2 1962               9           0.025908749               NA                      NA
## 3 1963              16           0.067362748               NA                      NA
## 4 1964              30           0.145088997               NA                      NA
## 5 1965              31           0.225406120               NA                      NA
## 6 1966              45           0.341995492                1                0.066357
#NA should be 0
dmerged[is.na(dmerged)] <- 0 
dmerged$Year<-as.integer(dmerged$Year)

#lets drop 2019 because its a bit of a dumb point
dmerged %>% select(.,ArticlesGeneral,ArticlesParasite,Year) %>% filter(.,Year!=2019) %>% tidyr::gather("id", "value", 1:2) %>% ggplot(aes(Year, value)) + 
    geom_point(aes(colour = factor(id)),size = 1) +
    geom_line(aes(colour = factor(id))) +
    theme_bw() +
    theme(axis.text.x = element_text(angle = 90, hjust = 1)) +
    labs(title="Allozymes",x='Year', y=expression(sqrt(italic('Article Number'))), fill="Subset",color = "Article Type\n") +
  scale_y_continuous(trans='sqrt')

#for splines
dmerged<-dmerged %>% filter(.,Year!=2019)
spline_int <- as.data.frame(spline(dmerged$Year, dmerged$ArticlesParasite))
spline_int2 <- as.data.frame(spline(dmerged$Year, dmerged$ArticlesGeneral))
spline_int$y[spline_int$y < 0] <- 0

ggplot(dmerged) + 
  geom_point(aes(dmerged$Year,dmerged$ArticlesGeneral), col='red',size = 1) +
  geom_point(aes(dmerged$Year,dmerged$ArticlesParasite), col='blue',size = 1) +
  geom_line(data = spline_int2, aes(x,y)) +
  geom_area(data = spline_int2, aes(x,y,fill='blue')) +
  geom_line(data = spline_int, aes(x,y)) +
  geom_area(data = spline_int, aes(x,y,fill='red')) +
  theme_bw() +
  theme(axis.text.x = element_text(angle = 90, hjust = 1)) +
  labs(title="Allozymes",x='Year', y=expression(sqrt(italic('Article Number'))), fill="Subset") +
  scale_fill_manual(labels = c("Everyone", "Parasites"), values = alpha(c("red", "blue"),.6)) +
  scale_y_continuous(trans='sqrt')

And you can do the same sort of bar graphs for the regions from the file you download from WoS countries/regions tab

df_count<-read.table('../broadersearchers/AllozymePerCountryBroadSearch.txt',header=T,row.names=NULL,sep='\t')
head(df_count)
##   Countries_Regions records percent_of_38597
## 1               USA   12395           32.114
## 2             JAPAN    3321            8.604
## 3           GERMANY    2305            5.972
## 4             ITALY    2170            5.622
## 5            FRANCE    2000            5.182
## 6           ENGLAND    1991            5.158
#take top 10
df_count %>% arrange(.,desc(records)) %>% slice(.,1:10) %>% ggplot(., aes(Countries_Regions, records)) +
  geom_bar(stat = "identity",fill=brewer.pal(10, "Spectral")) +
  coord_flip() +
  theme_bw() 

#with everyone else category
vec<-df_count %>% arrange(.,desc(records)) %>% slice(.,11:nrow(.)) %>% select(.,records) %>% sum()
vec2<-data.frame(Countries_Regions='OTHER',records=as.integer(vec))
df_count<-df_count %>% arrange(.,desc(records)) %>% slice(.,1:10) %>% select(.,-percent_of_38597) %>% rbind(.,vec2)
ggplot(df_count, aes(Countries_Regions, records)) +
  geom_bar(stat = "identity",fill=brewer.pal(11, "Spectral")) +
  coord_flip() +
  theme_bw() 

Comparing parasites to general searches via API download

This block will be if you are comparing parasite searches that you downloaded 500 at a time to the publication per regions and publication per year files that you downloaded using the API.

Bring in the file that you brought down using the API and set it up with our orginal file from above:

df_api<-read.table('../broadersearchers/AllozymeFromAPI.txt',header=F,row.names=NULL,sep='|',quote="",comment.char="",stringsAsFactors = F)

colnames(df_api) <- c("WoS_id", "Title","Year","Author","Journal","Keywords","Article_type")

head(df_api)
##                WoS_id
## 1 WOS:A1994QA56200011
## 2 WOS:A1982NQ93200006
## 3 WOS:A1997WH72500012
## 4 WOS:A1978FT73500009
## 5 WOS:A1991HG72000058
## 6 WOS:000224155500052
##                                                                                                                                                                                                                                                                                                           Title
## 1                                                                                                                                                                 DETECTION OF A NOVEL LACTATE-DEHYDROGENASE ISOZYME AND AN APPARENT DIFFERENTIATION-ASSOCIATED SHIFT IN ISOZYME PROFILE IN HEPATOMA-CELL LINES
## 2                                                                                                                                                                        SPECIES-SPECIFIC OR ISOZYME-SPECIFIC ENZYME-INHIBITORS .4. DESIGN OF A 2-SITE INHIBITOR OF ADENYLATE KINASE WITH ISOENZYME SELECTIVITY
## 3                                                                                                                                                                                                                                     Inheritance and linkage relationships of nine isozyme loci in wild radish
## 4 USE OF ADENINE-NUCLEOTIDE DERIVATIVES TO ASSESS POTENTIAL OF EXO-ACTIVE-SITE-DIRECTED REAGENTS AS SPECIES-SPECIFIC OR ISOZYME-SPECIFIC ENZYME INACTIVATORS .2. ISOZYME-SPECIFIC INACTIVATION OF A MAMMALIAN ENZYME AND ITS SIGNIFICANCE IN POSSIBLE DESIGN OF FETAL ISOENZYME TARGETED ANTI-NEOPLASTIC AGENTS
## 5                                                                                                                                                                                                                                           INDUCTION, PURIFICATION, AND CHARACTERIZATION OF CYTOCHROME-P450IIE
## 6                                                                                                                                                                                                                                        Isoenzyme polymorphism of some grapevine (Vitis vinifera L.) cultivars
##   Year     Author                                                                                    Journal
## 1 1994    LIU, TZ                                                                             CANCER LETTERS
## 2 1982 HAMPTON, A                                                             JOURNAL OF MEDICINAL CHEMISTRY
## 3 1997 Conner, JK                                                                        JOURNAL OF HEREDITY
## 4 1978 HAMPTON, A                                                             JOURNAL OF MEDICINAL CHEMISTRY
## 5 1991   YANG, CS                                                                      METHODS IN ENZYMOLOGY
## 6 2004 Jahnke, GG PROCEEDINGS OF THE 1ST INTERNATIONAL SYMPOSIUM ON GRAPEVINE GROWING, COMMERCE AND RESEARCH
##         Keywords      Article_type
## 1 HEPATOMA CELLS           Article
## 2           <NA>           Article
## 3           <NA>           Article
## 4           <NA>           Article
## 5           <NA>            Review
## 6      isoenzyme Proceedings Paper
df_api_year<-count(df_api,Year) %>% arrange(.,Year) %>% mutate(PercentPerYearGeneral=cumsum(n)/sum(n)*100)
colnames(df_api_year) <- c("Year","ArticlesGeneral","PercentPerYearGeneral")
df_api_year
## # A tibble: 59 x 3
##     Year ArticlesGeneral PercentPerYearGeneral
##    <int>           <int>                 <dbl>
##  1  1960               1               0.00259
##  2  1962               9               0.0259 
##  3  1963              16               0.0674 
##  4  1964              30               0.145  
##  5  1965              31               0.225  
##  6  1966              45               0.342  
##  7  1967              43               0.453  
##  8  1968              84               0.671  
##  9  1969             103               0.938  
## 10  1970              84               1.16   
## # ... with 49 more rows
write.table(df_api_year,'../broadersearchers/ProductionPerYearForAllozymeGeneralSearch_api',row.names=F,quote=F,sep='\t')


df_api_year$Year <- as.character(df_api_year$Year)

#have to merge them with earlier dataset
df1<-citations_ana.sum$AnnualProduction 
colnames(df1) <- c("Year", "ArticlesParasite")
df1<-df1 %>%  arrange(.,Year) %>%  mutate(PercentPerYearParasites=cumsum(ArticlesParasite)/sum(ArticlesParasite)*100)
df1$Year <- as.character(df1$Year)

dmerged<-full_join(df_api_year,df1,by='Year',all=TRUE) 
head(dmerged)
## # A tibble: 6 x 5
##   Year  ArticlesGeneral PercentPerYearGeneral ArticlesParasite PercentPerYearParasites
##   <chr>           <int>                 <dbl>            <int>                   <dbl>
## 1 1960                1               0.00259               NA                 NA     
## 2 1962                9               0.0259                NA                 NA     
## 3 1963               16               0.0674                NA                 NA     
## 4 1964               30               0.145                 NA                 NA     
## 5 1965               31               0.225                 NA                 NA     
## 6 1966               45               0.342                  1                  0.0664
#NA should be 0
dmerged[is.na(dmerged)] <- 0 
dmerged$Year<-as.integer(dmerged$Year)

#lets drop 2019 because its a bit of a dumb point
dmerged %>% select(.,ArticlesGeneral,ArticlesParasite,Year) %>% filter(.,Year!=2019) %>% tidyr::gather("id", "value", 1:2) %>% ggplot(aes(Year, value)) + 
    geom_point(aes(colour = factor(id)),size = 1) +
    geom_line(aes(colour = factor(id))) +
    theme_bw() +
    theme(axis.text.x = element_text(angle = 90, hjust = 1)) +
    labs(title="Allozymes",x='Year', y=expression(sqrt(italic('Article Number'))), fill="Subset",color = "Article Type\n") +
  scale_y_continuous(trans='sqrt')

#for splines
dmerged<-dmerged %>% filter(.,Year!=2019)
spline_int <- as.data.frame(spline(dmerged$Year, dmerged$ArticlesParasite))
spline_int2 <- as.data.frame(spline(dmerged$Year, dmerged$ArticlesGeneral))
spline_int$y[spline_int$y < 0] <- 0

ggplot(dmerged) + 
  geom_point(aes(dmerged$Year,dmerged$ArticlesGeneral), col='red',size = 1) +
  geom_point(aes(dmerged$Year,dmerged$ArticlesParasite), col='blue',size = 1) +
  geom_line(data = spline_int2, aes(x,y)) +
  geom_area(data = spline_int2, aes(x,y,fill='blue')) +
  geom_line(data = spline_int, aes(x,y)) +
  geom_area(data = spline_int, aes(x,y,fill='red')) +
  theme_bw() +
  theme(axis.text.x = element_text(angle = 90, hjust = 1)) +
  labs(title="Allozymes",x='Year', y=expression(sqrt(italic('Article Number'))), fill="Subset") +
  scale_fill_manual(labels = c("Everyone", "Parasites"), values = alpha(c("red", "blue"),.6)) +
  scale_y_continuous(trans='sqrt')

We can do a word cloud using the keywords from the API download

head(df_api$Keywords)
## [1] "HEPATOMA CELLS" NA               NA               NA               NA               "isoenzyme"
#treating each word individually, just spliting up mutlple key words into individual rows and removing search term words as before
df_api_Keywords <- df_api %>% select(.,Keywords) %>% separate_rows(.,Keywords,sep=",") %>% na.omit() %>% filter(!grepl('allozyme|electrophoresis|isoenzyme|isozyme|rapd|carbonic anhydrase|aflp|creatine kinase|protein kinase|alkaline phosphatase|cytochrome P450|glutathione S-transferase|alcohol dehydrogenase|lactate dehydrogenase|catalase|aldehyde dehydrogenase|hexokinase|peroxidase|5 alpha-reductase',Keywords,ignore.case = TRUE))

#needs a bit more cleaning for punctuation etc ignore warnings here
forwordcloud.Corpus<-Corpus(VectorSource(df_api_Keywords))
forwordcloud.Corpus<- tm_map(forwordcloud.Corpus, removePunctuation)
## Warning in tm_map.SimpleCorpus(forwordcloud.Corpus, removePunctuation): transformation drops documents
forwordcloud.Corpus <- tm_map(forwordcloud.Corpus, removeNumbers)
## Warning in tm_map.SimpleCorpus(forwordcloud.Corpus, removeNumbers): transformation drops documents
forwordcloud.Corpus <- tm_map(forwordcloud.Corpus, stripWhitespace)
## Warning in tm_map.SimpleCorpus(forwordcloud.Corpus, stripWhitespace): transformation drops documents
forwordcloud.Corpus <- tm_map(forwordcloud.Corpus,content_transformer(tolower))
## Warning in tm_map.SimpleCorpus(forwordcloud.Corpus, content_transformer(tolower)): transformation drops documents
#inspect(forwordcloud.Corpus)

#limit word output
wordcloud(forwordcloud.Corpus,max.words = 50,scale=c(2.2,.6))

wordcloud(forwordcloud.Corpus,max.words = 50,colors=brewer.pal(8, "Dark2"),scale=c(2.2,.6))

#may need some more cleaning of terms

Again we can also consider each term as a single phrase as well

df_api_Keywords_count<-df_api_Keywords %>% count(Keywords) %>% arrange(.,desc(n)) %>% slice(.,1:100) %>% filter(!grepl(1,Keywords))

wordcloud(tolower(df_api_Keywords_count$Keywords),as.numeric(df_api_Keywords_count$n), colors="black",max.words=30,scale=c(2.1,.6))

#again we have some double ups
df_api_Keywords_count<-df_api_Keywords_count %>%  mutate(fixkeyword=sub("Genetic diversity", "genetic diversity", Keywords)) %>% mutate(fixkeyword=sub("Antioxidant enzymes", "antioxidant enzymes", fixkeyword)) %>% group_by(.,fixkeyword) %>% summarise(n = sum(n)) %>% arrange(.,desc(n)) 

#colours and font
wordcloud(tolower(df_api_Keywords_count$fixkeyword),as.numeric(df_api_Keywords_count$n), colors=brewer.pal(8, "Set1"),max.words=30,scale=c(1.9,.6))

wordcloud(tolower(df_api_Keywords_count$fixkeyword),as.numeric(df_api_Keywords_count$n), colors=brewer.pal(8, "Dark2"),vfont=c("script","bold"),max.words=30,rot.per=0,scale=c(1.9,.6))

wordcloud(tolower(df_api_Keywords_count$fixkeyword),as.numeric(df_api_Keywords_count$n), colors=brewer.pal(8, "Dark2"),family = "mono",font = 2,max.words=30,scale=c(1.7,.6))

Comparing your different parasite searches

In your searches you will end up with multiple sets of downloads for parasites sets from WoS (using the 500 at a time approach). Make sure you move these into seperate folders for each search term so they dont get mixed up!

To bring in a second set of .bib files follow what we did above again.

file_list2<-list.files(path='../nonmedicalparasites/',pattern='*.bib',full.names=T)
citations_nonmed<-readFilesmod(dput(as.character(file_list2))) 
## c("../nonmedicalparasites//savedrecs(4).bib", "../nonmedicalparasites//savedrecs(5).bib"
## )
citations_nonmed_df <- convert2df(citations_nonmed, dbsource = "isi", format = "bibtex")
## 
## Converting your isi collection into a bibliographic dataframe
## 
## Articles extracted   100 
## Articles extracted   200 
## Articles extracted   300 
## Articles extracted   400 
## Articles extracted   500 
## Articles extracted   600 
## Articles extracted   614 
## Done!
## 
## 
## Generating affiliation field tag AU_UN from C1:  Done!
citations_nomed_ana <- biblioAnalysis(citations_nonmed_df, sep = ";")
citations_nomed_ana.sum <- summary(object = citations_nomed_ana, k = 100, pause = FALSE)
## 
## 
## Main Information about data
## 
##  Documents                             614 
##  Sources (Journals, Books, etc.)       262 
##  Keywords Plus (ID)                    2009 
##  Author's Keywords (DE)                1472 
##  Period                                1966 - 2018 
##  Average citations per documents       26.86 
## 
##  Authors                               1680 
##  Author Appearances                    2314 
##  Authors of single authored documents  44 
##  Authors of multi authored documents   1636 
## 
##  Documents per Author                  0.365 
##  Authors per Document                  2.74 
##  Co-Authors per Documents              3.77 
##  Collaboration Index                   3 
##  
## 
## Annual Scientific Production
## 
##  Year    Articles
##     1966        1
##     1975        2
##     1978        3
##     1979        1
##     1980        2
##     1982        2
##     1983        2
##     1984        2
##     1985        1
##     1986        4
##     1987        6
##     1988        1
##     1989        4
##     1990        3
##     1991       30
##     1992       36
##     1993       32
##     1994       27
##     1995       24
##     1996       25
##     1997       42
##     1998       43
##     1999       32
##     2000       31
##     2001       23
##     2002       27
##     2003       27
##     2004       29
##     2005       15
##     2006       15
##     2007       17
##     2008       14
##     2009       19
##     2010        8
##     2011        7
##     2012        7
##     2013        9
##     2014        4
##     2015       13
##     2016       10
##     2017        9
##     2018        5
## 
## Annual Percentage Growth Rate 4.003523 
## 
## 
## Most Productive Authors
## 
##     Authors        Articles   Authors        Articles Fractionalized
## 1   MATTIUCCI S          19 CLARK CG                            4.00
## 2   NASCETTI G           16 ANDREWS RH                          3.92
## 3   ANDREWS RH           14 PANIAGUA E                          3.75
## 4   PANIAGUA E           12 GOKA K                              3.67
## 5   GATTI S              11 VILAS R                             3.50
## 6   SCAGLIA M            11 BEVERIDGE I                         3.28
## 7   VILAS R              11 PETRI WA                            3.14
## 8   KOBAYASHI S          10 MATTIUCCI S                         3.13
## 9   POZIO E              10 TAKAFUJI A                          3.00
## 10  BEVERIDGE I           9 NAVAJAS M                           2.87
## 11  CHILTON NB            9 VERDYCK P                           2.70
## 12  HAQUE R               9 MIRELMAN D                          2.62
## 13  SNABEL V              9 CHILTON NB                          2.49
## 14  TAKEUCHI T            9 NASCETTI G                          2.49
## 15  BRUNO A               8 MONIS PT                            2.33
## 16  CEVINI C              8 HAQUE R                             2.30
## 17  GOKA K                8 SANMARTIN ML                        2.25
## 18  NAVAJAS M             8 BLAIR D                             2.17
## 19  TACHIBANA H           8 POZIO E                             2.11
## 20  CLARK CG              7 DIAMOND LS                          2.03
## 21  PAOLETTI M            7 EBERT D                             2.03
## 22  SANMARTIN ML          7 OROZCO E                            2.03
## 23  TAKAFUJI A            7 DUFFY JE                            2.00
## 24  CABARET J             6 RANNALA BH                          2.00
## 25  CIPRIANI P            6 CABARET J                           1.98
## 26  MIRELMAN D            6 KOBAYASHI S                         1.88
## 27  PASTEUR N             6 SARGEAUNT PG                        1.87
## 28  PETRI WA              6 LITTLE TJ                           1.83
## 29  RENAUD F              6 GATTI S                             1.81
## 30  SARGEAUNT PG          6 SCAGLIA M                           1.81
## 31  BLAIR D               5 TAKEUCHI T                          1.81
## 32  DIAMOND LS            5 YAMAZAKI Y                          1.75
## 33  EBERT D               5 SNABEL V                            1.75
## 34  MAYRHOFER G           5 WILLIAMS JE                         1.70
## 35  MONIS PT              5 JACKSON TFHG                        1.68
## 36  PHILLIPS CB           5 STOUTHAMER R                        1.67
## 37  ROMAN B               5 PHILLIPS CB                         1.59
## 38  SATOVIC Z             5 EDWARDS DD                          1.53
## 39  TSAGKARAKOU A         5 BURDON JJ                           1.50
## 40  WEBB SC               5 CLAY K                              1.50
## 41  WILLIAMS JE           5 CROFT BA                            1.50
## 42  CASTILLO P            4 ILINE II                            1.45
## 43  CUBERO JI             4 PASTEUR N                           1.45
## 44  DARDE ML              4 ANTOLIN MF                          1.33
## 45  DUBINSKY P            4 BOHONAK AJ                          1.33
## 46  DUCHENE M             4 FERGUSON DJP                        1.33
## 47  EDWARDS DD            4 GOTOH T                             1.33
## 48  GOTOH T               4 NEVO E                              1.33
## 49  HALL A                4 OSAKABE M                           1.33
## 50  ILINE II              4 TOMAVO S                            1.33
## 51  JACKSON TFHG          4 CEVINI C                            1.33
## 52  LAROSA G              4 D'AMELIO S                          1.31
## 53  LITTLE TJ             4 TACHIBANA H                         1.31
## 54  OROZCO E              4 THOMPSON RCA                        1.25
## 55  OSAKABE M             4 MAYRHOFER G                         1.23
## 56  RUBIALES D            4 BRUNO A                             1.22
## 57  STOUTHAMER R          4 STANLEY SL                          1.20
## 58  SUZUKI J              4 VRIJENHOEK RC                       1.20
## 59  THOMPSON RCA          4 CLOSE RL                            1.17
## 60  TORRES AM             4 DARDE ML                            1.17
## 61  VERDYCK P             4 PAOLETTI M                          1.16
## 62  VOVLAS N              4 RENAUD F                            1.16
## 63  WIEDERMANN G          4 TSAGKARAKOU A                       1.12
## 64  YANAGI T              4 BURCHARD GD                         1.08
## 65  ABAUNZA P             3 DUCHENE M                           1.03
## 66  AGUIRRE A             3 BARRETT J                           1.00
## 67  AVANZATI AM           3 BIASIOLO A                          1.00
## 68  BARATTI M             3 BLANC DS                            1.00
## 69  BELLISARIO B          3 BRUCKNER DA                         1.00
## 70  BERECZKI J            3 BRYANT C                            1.00
## 71  BERNINI F             3 CHACIN-BONILLA L                    1.00
## 72  BERNUZZI AM           3 CLOUTMAN DG                         1.00
## 73  BINDER M              3 DESSER SS                           1.00
## 74  BOOMSMA JJ            3 DILLS WL                            1.00
## 75  BOUTEILLE B           3 DUNLEY JE                           1.00
## 76  BRACHA R              3 FORD BA                             1.00
## 77  BRISCOE DA            3 FUSU L                              1.00
## 78  BULLINI L             3 GARDNER JPA                         1.00
## 79  BURCHARD GD           3 GARDNER SL                          1.00
## 80  CARNEIRO RMDG         3 GRANT WN                            1.00
## 81  CIANCHI R             3 GREENSTONE MH                       1.00
## 82  CLOSE RL              3 HAASE M                             1.00
## 83  CROFT BA              3 HAFNER MS                           1.00
## 84  D'AMELIO S            3 HALL GS                             1.00
## 85  DEMEEUS T             3 HINOMOTO N                          1.00
## 86  DURAND P              3 HOSHIZAKI S                         1.00
## 87  EISENBACK JD          3 HSIAO TH                            1.00
## 88  EY PL                 3 JAZAYERI JA                         1.00
## 89  GIBSON DI             3 JEROME CA                           1.00
## 90  GONZALEZRUIZ A        3 JOHANNESEN J                        1.00
## 91  GUHL F                3 JOHNSON SG                          1.00
## 92  HANZELOVA V           3 KANG NJ                             1.00
## 93  HEINZE J              3 KAZMER DJ                           1.00
## 94  KARSSEN G             3 KITASHIMA Y                         1.00
## 95  KITASHIMA Y           3 KOLLAR A                            1.00
## 96  LA ROSA G             3 LEBER AL                            1.00
## 97  LAGNEL J              3 LEUCHTMANN A                        1.00
## 98  LYMBERY AJ            3 LODE T                              1.00
## 99  MARCHI L              3 LYMBERY AJ                          1.00
## 100 MELONI BP             3 MARTIN FN                           1.00
## 
## 
## Top manuscripts per citations
## 
##                                       Paper           TC TCperYear
## 1   LINHART YB, 1996, ANNU REV ECOL SYST             798     36.27
## 2   DIAMOND LS, 1993, J EUKARYOT MICROBIOL           381     15.24
## 3   ARNAUD-HAOND S, 2007, MOL ECOL                   336     30.55
## 4   SARGEAUNT PG, 1978, TRANS ROY SOC TROP MED HYG-a 298      7.45
## 5   DARDE ML, 1992, J PARASITOL                      198      7.62
## 6   MATTIUCCI S, 1997, J PARASITOL                   197      9.38
## 7   MONIS PT, 2003, INFECT GENET EVOL                189     12.60
## 8   MOLLER AP, 1998, BEHAV ECOL SOCIOBIOL            188      9.40
## 9   BLACK WC, 1992, BULL ENTOMOL RES                 187      7.19
## 10  CLARK CG, 1991, MOL BIOCHEM PARASITOL            171      6.33
## 11  HAQUE R, 1998, J CLIN MICROBIOL                  166      8.30
## 12  SOLTIS DE, 1991, AM J BOT-a                      166      6.15
## 13  DYBDAHL MF, 1996, EVOLUTION                      165      7.50
## 14  ZARLENGA DS, 1999, INT J PARASIT                 162      8.53
## 15  LOXDALE HD, 1998, BULL ENTOMOL RES               154      7.70
## 16  SCHWARZ D, 2005, NATURE                          151     11.62
## 17  BURDON JJ, 1993, ANNU REV PHYTOPATHOL            140      5.60
## 18  EBERT D, 1998, PROC R SOC B-BIOL SCI             137      6.85
## 19  LESSA EP, 1998, MOL PHYLOGENET EVOL              126      6.30
## 20  THOMAS Y, 2003, EVOLUTION                        124      8.27
## 21  HAQUE R, 1995, J CLIN MICROBIOL                  120      5.22
## 22  BAYMAN P, 1991, CAN J BOT -REV CAN BOT           112      4.15
## 23  SMITH MA, 2008, MOL ECOL RESOUR                  111     11.10
## 24  NEFF BD, 2001, EVOLUTION                         111      6.53
## 25  ZIJLSTRA C, 1995, PHYTOPATHOLOGY                 111      4.83
## 26  TANNICH E, 1991, J CLIN MICROBIOL                110      4.07
## 27  NEVO E, 1998, GENET RESOUR CROP EVOL             107      5.35
## 28  MIRELMAN D, 1986, INFECT IMMUN                   104      3.25
## 29  SARGEAUNT PG, 1978, TRANS ROY SOC TROP MED HYG   104      2.60
## 30  ACUNASOTO R, 1993, AM J TROP MED HYG             103      4.12
## 31  VALENTINI A, 2006, J PARASITOL                   102      8.50
## 32  NAVAJAS M, 2000, EXP APPL ACAROL-a               102      5.67
## 33  NADLER SA, 2011, PARASITOLOGY                     99     14.14
## 34  MARTIN FN, 2000, MYCOLOGIA                        95      5.28
## 35  OUDEMANS P, 1991, MYCOL RES                       91      3.37
## 36  MIRELMAN D, 1986, EXP PARASITOL                   91      2.84
## 37  MATTIUCCI S, 2002, SYST PARASITOL                 88      5.50
## 38  MATTIUCCI S, 2004, J FISH BIOL                    83      5.93
## 39  GELTER HP, 1992, BEHAV ECOL SOCIOBIOL             82      3.15
## 40  SUZUKI T, 2005, CURR MED CHEM                     78      6.00
## 41  BANDI C, 1995, PARASITOLOGY                       78      3.39
## 42  LITTLE TJ, 1999, J ANIM ECOL                      77      4.05
## 43  DUNCAN AB, 2007, EVOLUTION                        76      6.91
## 44  CARMONA JA, 1997, GENETICS                        76      3.62
## 45  JAMES TY, 1999, EVOLUTION                         75      3.95
## 46  MELONI BP, 1995, J PARASITOL                      73      3.17
## 47  GONZALEZRUIZ A, 1994, J CLIN PATHOL               73      3.04
## 48  MOPPER S, 2000, ECOLOGY                           72      4.00
## 49  DELMOTTE F, 1999, HEREDITY                        72      3.79
## 50  HAQUE R, 1993, J INFECT DIS                       71      2.84
## 51  FERGUSON DJP, 2004, INT J PARASIT                 69      4.93
## 52  SOLTIS DE, 1991, AM J BOT                         68      2.52
## 53  TACHIBANA H, 1991, J CLIN MICROBIOL               67      2.48
## 54  SARGEAUNT PG, 1980, TRANS ROY SOC TROP MED HYG    67      1.76
## 55  POZIO E, 2002, INT J PARASIT                      66      4.12
## 56  GARDNER JPA, 1994, ARCH HYDROBIOL                 65      2.71
## 57  PATERSON AM, 2000, SYST BIOL                      64      3.56
## 58  MATTIUCCI S, 2014, J PARASITOL                    62     15.50
## 59  NAGANO I, 1999, INT J PARASIT                     62      3.26
## 60  DARDE ML, 1998, J CLIN MICROBIOL                  62      3.10
## 61  CODJIA V, 1993, ACTA TROP                         62      2.48
## 62  ABAUNZA P, 2008, FISH RES                         61      6.10
## 63  HAAG CR, 2005, GENETICS                           61      4.69
## 64  ANTOLIN MF, 1999, RES POPUL ECOL                  61      3.21
## 65  CLAY K, 1993, AGRIC ECOSYST ENVIRON               61      2.44
## 66  IRUSEN EM, 1992, CLIN INFECT DIS                  60      2.31
## 67  BOLLINGER EK, 1991, BEHAV ECOL SOCIOBIOL          59      2.19
## 68  GREENSTONE MH, 2006, BULL ENTOMOL RES             58      4.83
## 69  FERGUSON DJP, 2002, INT J PARASIT                 58      3.62
## 70  ASIEGBU FO, 1994, PHYSIOL MOL PLANT PATHOL        58      2.42
## 71  MATTIUCCI S, 2001, INT J PARASIT                  57      3.35
## 72  MEAGHER S, 1999, EVOLUTION                        57      3.00
## 73  DUFFY JE, 1996, EVOLUTION                         57      2.59
## 74  DUFFY JE, 1993, MAR BIOL                          57      2.28
## 75  FEDER JL, 1997, EVOLUTION                         56      2.67
## 76  ANDREWS RH, 1992, PARASITOLOGY                    56      2.15
## 77  QUINN TP, 1987, CAN J FISH AQUAT SCI              56      1.81
## 78  EMELIANOV I, 2003, J EVOL BIOL                    55      3.67
## 79  MATTIUCCI S, 2008, FISH RES                       54      5.40
## 80  MATTIUCCI S, 2009, SYST PARASITOL                 53      5.89
## 81  VILLEMANT C, 2007, SYST ENTOMOL                   53      4.82
## 82  CHILTON NB, 1992, INT J PARASIT-a                 53      2.04
## 83  HUANG HW, 1998, AM J BOT                          52      2.60
## 84  MOLBO D, 1996, PROC R SOC B-BIOL SCI              51      2.32
## 85  CHABOUDEZ P, 1995, OECOLOGIA                      51      2.22
## 86  MATTIUCCI S, 2007, VET PARASITOL                  50      4.55
## 87  GRANT WN, 1994, INT J PARASIT                     50      2.08
## 88  TOMAVO S, 2001, INT J PARASIT                     49      2.88
## 89  MILGROOM MG, 1995, PHYTOPATHOLOGY                 49      2.13
## 90  ALS TD, 2002, ECOL ENTOMOL                        48      3.00
## 91  CHEN W, 1992, PHYTOPATHOLOGY                      48      1.85
## 92  JEROME CA, 2002, MOL ECOL                         47      2.94
## 93  UESUGI R, 2002, J ECON ENTOMOL                    46      2.88
## 94  HEDRICK PW, 1998, EVOLUTION                       46      2.30
## 95  SCHULTZ TR, 1998, INSECT SOC                      46      2.30
## 96  BRITTEN D, 1997, J CLIN MICROBIOL                 46      2.19
## 97  NEVO E, 1994, HEREDITY                            45      1.88
## 98  MITCHELL SE, 2004, ECOL LETT                      44      3.14
## 99  WEEKS AR, 1995, EXP APPL ACAROL                   44      1.91
## 100 BURCH DJ, 1991, J CLIN MICROBIOL                  44      1.63
## 
## 
## Most Productive Countries (of corresponding authors)
## 
##         Country   Articles   Freq SCP MCP MCP_Ratio
## 1  USA                 102 0.1735  83  19    0.1863
## 2  UNITED KINGDOM       49 0.0833  31  18    0.3673
## 3  ITALY                45 0.0765  27  18    0.4000
## 4  FRANCE               44 0.0748  30  14    0.3182
## 5  AUSTRALIA            42 0.0714  38   4    0.0952
## 6  JAPAN                41 0.0697  35   6    0.1463
## 7  SPAIN                31 0.0527  18  13    0.4194
## 8  GERMANY              19 0.0323  14   5    0.2632
## 9  BRAZIL               18 0.0306  14   4    0.2222
## 10 CANADA               18 0.0306  14   4    0.2222
## 11 MEXICO               13 0.0221  11   2    0.1538
## 12 SWITZERLAND          12 0.0204   8   4    0.3333
## 13 NEW ZEALAND          10 0.0170  10   0    0.0000
## 14 AUSTRIA               9 0.0153   7   2    0.2222
## 15 INDIA                 9 0.0153   9   0    0.0000
## 16 SLOVAKIA              8 0.0136   0   8    1.0000
## 17 BELGIUM               7 0.0119   5   2    0.2857
## 18 CHINA                 7 0.0119   6   1    0.1429
## 19 ISRAEL                7 0.0119   5   2    0.2857
## 20 KENYA                 7 0.0119   4   3    0.4286
## 21 NETHERLANDS           7 0.0119   2   5    0.7143
## 22 SOUTH AFRICA          6 0.0102   5   1    0.1667
## 23 EGYPT                 5 0.0085   3   2    0.4000
## 24 POLAND                5 0.0085   3   2    0.4000
## 25 SWEDEN                5 0.0085   5   0    0.0000
## 26 TURKEY                5 0.0085   5   0    0.0000
## 27 DENMARK               4 0.0068   2   2    0.5000
## 28 FINLAND               4 0.0068   2   2    0.5000
## 29 HUNGARY               4 0.0068   4   0    0.0000
## 30 THAILAND              4 0.0068   1   3    0.7500
## 31 VENEZUELA             4 0.0068   4   0    0.0000
## 32 ARGENTINA             3 0.0051   2   1    0.3333
## 33 CZECH REPUBLIC        3 0.0051   1   2    0.6667
## 34 SERBIA                3 0.0051   2   1    0.3333
## 35 TAIWAN                3 0.0051   0   3    1.0000
## 36 BANGLADESH            2 0.0034   1   1    0.5000
## 37 BULGARIA              2 0.0034   2   0    0.0000
## 38 CHILE                 2 0.0034   0   2    1.0000
## 39 IRAN                  2 0.0034   1   1    0.5000
## 40 IRELAND               2 0.0034   1   1    0.5000
## 41 KOREA                 2 0.0034   2   0    0.0000
## 42 PORTUGAL              2 0.0034   0   2    1.0000
## 43 ROMANIA               2 0.0034   2   0    0.0000
## 44 BAHAMAS               1 0.0017   0   1    1.0000
## 45 COLOMBIA              1 0.0017   1   0    0.0000
## 46 CROATIA               1 0.0017   0   1    1.0000
## 47 ESTONIA               1 0.0017   1   0    0.0000
## 48 GEORGIA               1 0.0017   0   1    1.0000
## 49 MAURITANIA            1 0.0017   1   0    0.0000
## 50 RUSSIA                1 0.0017   0   1    1.0000
## 51 SLOVENIA              1 0.0017   1   0    0.0000
## 52 URUGUAY               1 0.0017   0   1    1.0000
## 
## 
## SCP: Single Country Publications
## 
## MCP: Multiple Country Publications
## 
## 
## Total Citations per Country
## 
##      Country      Total Citations Average Article Citations
## 1  USA                       4452                     43.65
## 2  UNITED KINGDOM            1769                     36.10
## 3  FRANCE                    1492                     33.91
## 4  ITALY                     1337                     29.71
## 5  AUSTRALIA                 1209                     28.79
## 6  JAPAN                      699                     17.05
## 7  CANADA                     545                     30.28
## 8  SPAIN                      481                     15.52
## 9  SWITZERLAND                390                     32.50
## 10 ISRAEL                     375                     53.57
## 11 GERMANY                    355                     18.68
## 12 PORTUGAL                   338                    169.00
## 13 NETHERLANDS                174                     24.86
## 14 SWEDEN                     166                     33.20
## 15 BRAZIL                     160                      8.89
## 16 NEW ZEALAND                155                     15.50
## 17 SOUTH AFRICA               142                     23.67
## 18 MEXICO                     140                     10.77
## 19 DENMARK                    127                     31.75
## 20 URUGUAY                    126                    126.00
## 21 KENYA                      115                     16.43
## 22 AUSTRIA                     96                     10.67
## 23 BANGLADESH                  77                     38.50
## 24 FINLAND                     77                     19.25
## 25 SLOVAKIA                    65                      8.12
## 26 TURKEY                      64                     12.80
## 27 POLAND                      54                     10.80
## 28 INDIA                       47                      5.22
## 29 VENEZUELA                   42                     10.50
## 30 THAILAND                    39                      9.75
## 31 KOREA                       34                     17.00
## 32 IRELAND                     31                     15.50
## 33 TAIWAN                      28                      9.33
## 34 BELGIUM                     27                      3.86
## 35 ARGENTINA                   24                      8.00
## 36 IRAN                        24                     12.00
## 37 HUNGARY                     23                      5.75
## 38 CHINA                       18                      2.57
## 39 EGYPT                       18                      3.60
## 40 GEORGIA                     17                     17.00
## 41 CZECH REPUBLIC              16                      5.33
## 42 SERBIA                      13                      4.33
## 43 ROMANIA                     12                      6.00
## 44 BAHAMAS                      8                      8.00
## 45 COLOMBIA                     5                      5.00
## 46 ESTONIA                      5                      5.00
## 47 CHILE                        3                      1.50
## 48 SLOVENIA                     2                      2.00
## 49 BULGARIA                     1                      0.50
## 50 CROATIA                      1                      1.00
## 51 MAURITANIA                   1                      1.00
## 52 RUSSIA                       0                      0.00
## 
## 
## Most Relevant Sources
## 
##                                                                                    Sources        Articles
## 1   INTERNATIONAL JOURNAL FOR PARASITOLOGY                                                              31
## 2   PARASITOLOGY RESEARCH                                                                               24
## 3   JOURNAL OF PARASITOLOGY                                                                             23
## 4   PARASITOLOGY                                                                                        17
## 5   EVOLUTION                                                                                           13
## 6   JOURNAL OF CLINICAL MICROBIOLOGY                                                                    10
## 7   SYSTEMATIC PARASITOLOGY                                                                             10
## 8   ANNALS OF THE ENTOMOLOGICAL SOCIETY OF AMERICA                                                       9
## 9   APPLIED ENTOMOLOGY AND ZOOLOGY                                                                       9
## 10  EXPERIMENTAL AND APPLIED ACAROLOGY                                                                   9
## 11  TRANSACTIONS OF THE ROYAL SOCIETY OF TROPICAL MEDICINE AND HYGIENE                                   9
## 12  VETERINARY PARASITOLOGY                                                                              9
## 13  BULLETIN OF ENTOMOLOGICAL RESEARCH                                                                   8
## 14  EXPERIMENTAL \\& APPLIED ACAROLOGY                                                                   8
## 15  MOLECULAR ECOLOGY                                                                                    8
## 16  PHYTOPATHOLOGY                                                                                       8
## 17  AMERICAN JOURNAL OF TROPICAL MEDICINE AND HYGIENE                                                    7
## 18  PLANT DISEASE                                                                                        7
## 19  BIOLOGICAL JOURNAL OF THE LINNEAN SOCIETY                                                            6
## 20  EXPERIMENTAL PARASITOLOGY                                                                            6
## 21  HEREDITY                                                                                             6
## 22  JOURNAL OF MEDICAL ENTOMOLOGY                                                                        6
## 23  JOURNAL OF NEMATOLOGY                                                                                6
## 24  NEMATOLOGY                                                                                           6
## 25  PARASITE-JOURNAL DE LA SOCIETE FRANCAISE DE PARASITOLOGIE                                            6
## 26  ACTA TROPICA                                                                                         5
## 27  AMERICAN JOURNAL OF BOTANY                                                                           5
## 28  HELMINTHOLOGIA                                                                                       5
## 29  JOURNAL OF PROTOZOOLOGY                                                                              5
## 30  MYCOLOGICAL RESEARCH                                                                                 5
## 31  ANNALS OF TROPICAL MEDICINE AND PARASITOLOGY                                                         4
## 32  ARCHIVES OF MEDICAL RESEARCH                                                                         4
## 33  BIOCHEMICAL SYSTEMATICS AND ECOLOGY                                                                  4
## 34  CANADIAN JOURNAL OF ZOOLOGY-REVUE CANADIENNE DE ZOOLOGIE                                             4
## 35  ICOPA IX - 9TH INTERNATIONAL CONGRESS OF PARASITOLOGY                                                4
## 36  JOURNAL OF EUKARYOTIC MICROBIOLOGY                                                                   4
## 37  JOURNAL OF EVOLUTIONARY BIOLOGY                                                                      4
## 38  MOLECULAR AND BIOCHEMICAL PARASITOLOGY                                                               4
## 39  PLANT PATHOLOGY                                                                                      4
## 40  PLOS ONE                                                                                             4
## 41  ACTA PARASITOLOGICA                                                                                  3
## 42  BEHAVIORAL ECOLOGY AND SOCIOBIOLOGY                                                                  3
## 43  BIOLOGICAL CONTROL                                                                                   3
## 44  COMPARATIVE BIOCHEMISTRY AND PHYSIOLOGY B-BIOCHEMISTRY \\& MOLECULAR   BIOLOGY                       3
## 45  ENTOMOLOGIA EXPERIMENTALIS ET APPLICATA                                                              3
## 46  EUROPEAN JOURNAL OF ENTOMOLOGY                                                                       3
## 47  FISHERIES RESEARCH                                                                                   3
## 48  INDIAN JOURNAL OF MEDICAL RESEARCH SECTION A-INFECTIOUS DISEASES                                     3
## 49  INFECTIOUS AGENTS AND DISEASE-REVIEWS ISSUES AND COMMENTARY                                          3
## 50  INSECTES SOCIAUX                                                                                     3
## 51  INTERNATIONAL JOURNAL OF FOOD MICROBIOLOGY                                                           3
## 52  JAPANESE JOURNAL OF APPLIED ENTOMOLOGY AND ZOOLOGY                                                   3
## 53  JOURNAL OF INFECTIOUS DISEASES                                                                       3
## 54  JOURNAL OF NATURAL HISTORY                                                                           3
## 55  JOURNAL OF ZOOLOGICAL SYSTEMATICS AND EVOLUTIONARY RESEARCH                                          3
## 56  MYCOLOGIA                                                                                            3
## 57  NEMATROPICA                                                                                          3
## 58  PROCEEDINGS OF THE ROYAL SOCIETY B-BIOLOGICAL SCIENCES                                               3
## 59  ARCHIVOS DE INVESTIGACION MEDICA                                                                     2
## 60  AUSTRALIAN JOURNAL OF ZOOLOGY                                                                        2
## 61  BELGIAN JOURNAL OF ZOOLOGY                                                                           2
## 62  BIOCHEMICAL GENETICS                                                                                 2
## 63  CANADIAN JOURNAL OF BOTANY-REVUE CANADIENNE DE BOTANIQUE                                             2
## 64  CLINICAL MICROBIOLOGY REVIEWS                                                                        2
## 65  ECOLOGICAL ENTOMOLOGY                                                                                2
## 66  ENVIRONMENTAL BIOLOGY OF FISHES                                                                      2
## 67  EUPHYTICA                                                                                            2
## 68  GENETICA                                                                                             2
## 69  GENETICS                                                                                             2
## 70  IN VITRO CELLULAR \\& DEVELOPMENTAL BIOLOGY-ANIMAL                                                   2
## 71  INFECTION AND IMMUNITY                                                                               2
## 72  INFECTION GENETICS AND EVOLUTION                                                                     2
## 73  INTERNATIONAL JOURNAL OF ACAROLOGY                                                                   2
## 74  INVESTIGACION CLINICA                                                                                2
## 75  JOURNAL OF FISH BIOLOGY                                                                              2
## 76  JOURNAL OF GENERAL MICROBIOLOGY                                                                      2
## 77  JOURNAL OF HELMINTHOLOGY                                                                             2
## 78  JOURNAL OF HEREDITY                                                                                  2
## 79  JOURNAL OF MAMMALOGY                                                                                 2
## 80  JOURNAL OF ZOO AND WILDLIFE MEDICINE                                                                 2
## 81  MARINE BIOLOGY                                                                                       2
## 82  MARINE ECOLOGY PROGRESS SERIES                                                                       2
## 83  MEMORIAS DO INSTITUTO OSWALDO CRUZ                                                                   2
## 84  PARASITOLOGY INTERNATIONAL                                                                           2
## 85  PHYSIOLOGICAL AND MOLECULAR PLANT PATHOLOGY                                                          2
## 86  PROCEEDINGS OF THE ENTOMOLOGICAL SOCIETY OF WASHINGTON                                               2
## 87  RUSSIAN JOURNAL OF NEMATOLOGY                                                                        2
## 88  THEORETICAL AND APPLIED GENETICS                                                                     2
## 89  16TH INTERNATIONAL SCIENTIFIC COLLOQUIUM ON COFFEE VOLS I \\& II                                     1
## 90  2010 4TH INTERNATIONAL CONFERENCE ON BIOINFORMATICS AND BIOMEDICAL   ENGINEERING (ICBBE 2010)        1
## 91  ACTA CRYSTALLOGRAPHICA SECTION D-BIOLOGICAL CRYSTALLOGRAPHY                                          1
## 92  ACTA PROTOZOOLOGICA                                                                                  1
## 93  ACTA THERIOLOGICA                                                                                    1
## 94  ACTA ZOOLOGICA BULGARICA                                                                             1
## 95  AFRICAN ENTOMOLOGY                                                                                   1
## 96  AFRICAN JOURNAL OF BIOTECHNOLOGY                                                                     1
## 97  AFRICAN ZOOLOGY                                                                                      1
## 98  AGRICULTURE ECOSYSTEMS \\& ENVIRONMENT                                                               1
## 99  AMERICAN JOURNAL OF ENOLOGY AND VITICULTURE                                                          1
## 100 AMERICAN MIDLAND NATURALIST                                                                          1
## 
## 
## Most Relevant Keywords
## 
##        Author Keywords (DE)      Articles              Keywords-Plus (ID)     Articles
## 1   ALLOZYMES                          60 DIFFERENTIATION                           71
## 2   ALLOZYME                           27 POPULATIONS                               56
## 3   ELECTROPHORESIS                    26 IDENTIFICATION                            53
## 4   TAXONOMY                           19 EVOLUTION                                 47
## 5   ENTAMOEBA HISTOLYTICA              18 DNA                                       31
## 6   GENE FLOW                          16 ELECTROPHORESIS                           29
## 7   RESISTANCE                         15 ISOENZYME PATTERNS                        27
## 8   ALLOZYME ELECTROPHORESIS           13 DIVERSITY                                 26
## 9   ENTAMOEBA DISPAR                   13 PARASITES                                 25
## 10  PHYLOGENY                          13 ALLOZYME                                  24
## 11  SPECIATION                         13 NATURAL-POPULATIONS                       23
## 12  ESTERASE                           12 VARIABILITY                               23
## 13  MORPHOLOGY                         12 AMEBIASIS                                 21
## 14  POPULATION GENETICS                12 HOST                                      21
## 15  ISOENZYME                          11 RESISTANCE                                19
## 16  ISOZYMES                           11 STRAINS                                   19
## 17  SYSTEMATICS                        11 VIRULENCE                                 19
## 18  TETRANYCHUS URTICAE                11 POPULATION-STRUCTURE                      18
## 19  GENETIC DIVERSITY                  10 MITOCHONDRIAL-DNA                         17
## 20  GENETIC STRUCTURE                  10 POPULATION                                17
## 21  GENETIC VARIATION                  10 INFECTION                                 16
## 22  PARASITE                           10 SEQUENCES                                 16
## 23  PCR                                10 SYSTEMATICS                               15
## 24  POPULATION STRUCTURE               10 ASCARIDOIDEA                              14
## 25  DIAGNOSIS                           9 DISTANCE                                  14
## 26  MITOCHONDRIAL DNA                   9 ENTAMOEBA-HISTOLYTICA                     14
## 27  MORPHOMETRICS                       9 FLOW                                      14
## 28  POLYMORPHISM                        9 GENETIC-VARIATION                         14
## 29  AMEBIASIS                           8 POLYMORPHISM                              14
## 30  GENETICS                            8 MARKERS                                   13
## 31  HYMENOPTERA                         8 PATTERNS                                  13
## 32  RAPD                                8 POLYMERASE CHAIN-REACTION                 13
## 33  GENETIC VARIABILITY                 7 POPULATION-GENETICS                       13
## 34  HETEROZYGOSITY                      7 SPECIATION                                13
## 35  HYBRIDIZATION                       7 ZYMODEMES                                 13
## 36  ISOENZYMES                          7 BRAZIL                                    12
## 37  LOCAL ADAPTATION                    7 DIAGNOSIS                                 12
## 38  POPULATION                          7 GENETIC DIVERSITY                         12
## 39  RFLP                                7 GENETIC-STRUCTURE                         12
## 40  TRICHINELLA                         7 RIBOSOMAL DNA                             12
## 41  VARIATION                           7 COEVOLUTION                               11
## 42  ZYMODEME                            7 DISPAR                                    11
## 43  ENTAMOEBA-HISTOLYTICA               6 EXPRESSION                                11
## 44  GENETIC                             6 PARASITE                                  11
## 45  ISOZYME                             6 DIVERGENCE                                10
## 46  MOLECULAR                           6 GENUS                                     10
## 47  PARASITES                           6 HETEROZYGOSITY                            10
## 48  SIBLING SPECIES                     6 HOMOSEXUAL MEN                            10
## 49  ACARI                               5 HYMENOPTERA                               10
## 50  BIOLOGICAL CONTROL                  5 ORIGIN                                    10
## 51  EPIDEMIOLOGY                        5 RHAGOLETIS-POMONELLA                      10
## 52  GENETIC DIFFERENTIATION             5 SELECTION                                 10
## 53  GIARDIA                             5 ACARI                                      9
## 54  IDENTIFICATION                      5 AXENIC CULTIVATION                         9
## 55  MELOIDOGYNE                         5 CRYPHONECTRIA-PARASITICA                   9
## 56  MICROSATELLITES                     5 DNA PROBES                                 9
## 57  PARASITISM                          5 DROSOPHILA-MELANOGASTER                    9
## 58  PARTHENOGENESIS                     5 FISH                                       9
## 59  PATHOGENICITY                       5 ISOZYME                                    9
## 60  ROOT-KNOT NEMATODE                  5 NEMATODA                                   9
## 61  SELECTION                           5 PARASITISM                                 9
## 62  SEXUAL REPRODUCTION                 5 PLANT-PARASITIC NEMATODES                  9
## 63  SPECIES                             5 RIBOSOMAL-RNA                              9
## 64  TOXOPLASMA GONDII                   5 SURFACE-ANTIGEN                            9
## 65  ALLOZYME VARIATION                  4 UNITED-STATES                              9
## 66  CESTODA                             4 BIOLOGICAL-CONTROL                         8
## 67  CHARACTERIZATION                    4 BIOLOGY                                    8
## 68  COEVOLUTION                         4 ELECTROPHORETIC ISOENZYME PATTERNS         8
## 69  CONSERVATION                        4 ENZYME PHENOTYPES                          8
## 70  COSPECIATION                        4 ENZYMES                                    8
## 71  CRYPTIC SPECIES                     4 GENE FLOW                                  8
## 72  DIFFERENTIATION                     4 PCR                                        8
## 73  EVOLUTION                           4 POLYMERASE-CHAIN-REACTION                  8
## 74  GENETIC DISTANCE                    4 POLYMORPHISMS                              8
## 75  HOST RACES                          4 SEXUAL REPRODUCTION                        8
## 76  HOST RANGE                          4 STRAIN                                     8
## 77  HOST SPECIFICITY                    4 ALLOZYME ANALYSIS                          7
## 78  INSECTA                             4 ALLOZYME DATA                              7
## 79  INTRASPECIFIC VARIATION             4 AMPLIFICATION                              7
## 80  IXODES RICINUS                      4 ASCARIDIDA                                 7
## 81  MALATE DEHYDROGENASE                4 ATLANTIC                                   7
## 82  MTDNA                               4 DIPTERA                                    7
## 83  NEMATODE                            4 GENETIC DIFFERENTIATION                    7
## 84  POPULATION GENETIC STRUCTURE        4 GROWTH                                     7
## 85  RED QUEEN                           4 INFECTIONS                                 7
## 86  REPRODUCTIVE ISOLATION              4 INSECTS                                    7
## 87  SHEEP                               4 ISOENZYME ANALYSIS                         7
## 88  AFLP                                3 IXODIDAE                                   7
## 89  COLONIZATION                        3 JAPAN                                      7
## 90  DIAGNOSTICS                         3 LECTIN                                     7
## 91  DIPTERA                             3 LEPIDOPTERA                                7
## 92  DISTRIBUTION                        3 LOCAL ADAPTATION                           7
## 93  DIVERSITY                           3 MONOCLONAL-ANTIBODIES                      7
## 94  E. HISTOLYTICA                      3 NONPATHOGENIC ENTAMOEBA-HISTOLYTICA        7
## 95  ENZYME ELECTROPHORESIS              3 PHYLOGENETIC-RELATIONSHIPS                 7
## 96  FISH                                3 PLANT                                      7
## 97  GENETIC DIVERGENCE                  3 PROTEIN                                    7
## 98  GENETIC POLYMORPHISM                3 RAPD MARKERS                               7
## 99  GEOGRAPHIC                          3 SIMPLEX COMPLEX ASCARIDIDA                 7
## 100 GEOGRAPHICAL VARIATION              3 SPIRALIS                                   7
#bar chart of top 10 countries
df_count_nomed<-data.frame(Country=as.character(citations_nomed_ana.sum$MostProdCountries$`Country  `),Article_count=as.integer(citations_nomed_ana.sum$MostProdCountries$Articles)) %>% slice(.,1:10)

ggplot(df_count_nomed, aes(Country, Article_count)) +
  geom_bar(stat = "identity",fill=brewer.pal(10, "Spectral")) +
  coord_flip() +
  theme_bw() 

#with everyone else category
vec<-as.data.frame(citations_nomed_ana$Countries,stringsAsFactors = F) %>% filter(!Tab %in% trimws(as.character(df_count_nomed$Country),which = c("both", "left", "right"))) %>% select(.,Freq) %>% sum()
vec2<-data.frame(Country='OTHER',Article_count=as.integer(vec))
df_count_nomed<-rbind(df_count_nomed,vec2)

ggplot(df_count_nomed, aes(Country, Article_count)) +
  geom_bar(stat = "identity",fill=brewer.pal(11, "Spectral")) +
  coord_flip() +
  theme_bw() 

#write it out as a table
write.table(citations_nomed_ana.sum$MostProdCountries,'../nonmedicalparasites/TopProducingCountriesForAllozymeNonMediacalParasiteSearch',row.names=F,quote=F,sep='\t')

#to see when XX % of papers were published
table<-citations_nomed_ana.sum$AnnualProduction %>% mutate(cumsum=cumsum(Articles),cumper=cumsum(Articles)/sum(Articles)*100)
table
##    Year    Articles cumsum      cumper
## 1     1966        1      1   0.1628664
## 2     1975        2      3   0.4885993
## 3     1978        3      6   0.9771987
## 4     1979        1      7   1.1400651
## 5     1980        2      9   1.4657980
## 6     1982        2     11   1.7915309
## 7     1983        2     13   2.1172638
## 8     1984        2     15   2.4429967
## 9     1985        1     16   2.6058632
## 10    1986        4     20   3.2573290
## 11    1987        6     26   4.2345277
## 12    1988        1     27   4.3973941
## 13    1989        4     31   5.0488599
## 14    1990        3     34   5.5374593
## 15    1991       30     64  10.4234528
## 16    1992       36    100  16.2866450
## 17    1993       32    132  21.4983713
## 18    1994       27    159  25.8957655
## 19    1995       24    183  29.8045603
## 20    1996       25    208  33.8762215
## 21    1997       42    250  40.7166124
## 22    1998       43    293  47.7198697
## 23    1999       32    325  52.9315961
## 24    2000       31    356  57.9804560
## 25    2001       23    379  61.7263844
## 26    2002       27    406  66.1237785
## 27    2003       27    433  70.5211726
## 28    2004       29    462  75.2442997
## 29    2005       15    477  77.6872964
## 30    2006       15    492  80.1302932
## 31    2007       17    509  82.8990228
## 32    2008       14    523  85.1791531
## 33    2009       19    542  88.2736156
## 34    2010        8    550  89.5765472
## 35    2011        7    557  90.7166124
## 36    2012        7    564  91.8566775
## 37    2013        9    573  93.3224756
## 38    2014        4    577  93.9739414
## 39    2015       13    590  96.0912052
## 40    2016       10    600  97.7198697
## 41    2017        9    609  99.1856678
## 42    2018        5    614 100.0000000
write.table(table,'../nonmedicalparasites/ProductionPerYearForNonMedicalParasites',row.names=F,quote=F,sep='\t')


ggplot(citations_nomed_ana.sum$AnnualProduction, aes(`Year   `,Articles, group=1)) +
  geom_point( size = 3,colour='red') +
  geom_line() +
  labs(title="Allozymes",x='Year', y='Article Number', fill="Subset") +
  theme_bw() +
  theme(axis.text.x = element_text(angle = 90, hjust = 1)) 

#create some splines to smooth the curve
spline_int <- as.data.frame(spline(citations_nomed_ana.sum$AnnualProductio$`Year   `, citations_nomed_ana.sum$AnnualProduction$Articles))

ggplot(citations_nomed_ana.sum$AnnualProduction) + 
  geom_point(aes(citations_nomed_ana.sum$AnnualProduction$`Year   `,citations_nomed_ana.sum$AnnualProduction$Articles), size = 1) +
  geom_line(data = spline_int, aes(x,y)) +
  geom_area(data = spline_int, aes(x,y,fill='red'),alpha=0.6) +
  theme_bw() +
  theme(axis.text.x = element_text(angle = 90, hjust = 1)) +
  labs(title="Allozymes",x='Year', y='Article Number', fill="Subset") +
  scale_fill_manual(labels = "Parasites", values = alpha("red",.6))

#words cloud
forwordcloud_nomed<-as.data.frame(cbind(as.character(trimws(citations_nomed_ana.sum$MostRelKeywords$`Author Keywords (DE)     `, which = c("both", "left", "right"))),citations_nomed_ana.sum$MostRelKeywords[2]),stringsAsFactors=FALSE)
colnames(forwordcloud_nomed)<-c('keyword','count_papers')

forwordcloud_nomed<- forwordcloud_nomed %>% filter(!grepl('allozyme|electrophoresis|isoenzyme|isozyme|rapd|carbonic anhydrase|aflp|creatine kinase|protein kinase|alkaline phosphatase|cytochrome P450|glutathione S-transferase|alcohol dehydrogenase|lactate dehydrogenase|catalase|aldehyde dehydrogenase|hexokinase|peroxidase|5 alpha-reductase',keyword,ignore.case = TRUE))

#create corpus
forwordcloud_nomed<-forwordcloud_nomed %>%  mutate(fixkeyword=sub("GENETICS", "GENETIC", keyword)) 

forwordcloud_nomed.Corpus<-Corpus(VectorSource(forwordcloud_nomed[rep(row.names(forwordcloud_nomed), forwordcloud_nomed$count_papers), 3]))

wordcloud(forwordcloud_nomed.Corpus,colors=brewer.pal(8, "Dark2"),max.words=30,scale=c(2.2,.6))

#whole phrases
wordcloud(tolower(forwordcloud_nomed$keyword),as.numeric(forwordcloud_nomed$count_papers), colors=brewer.pal(8, "Set1"),max.words=30,scale=c(1.3,.6))

wordcloud(tolower(forwordcloud_nomed$keyword),as.numeric(forwordcloud_nomed$count_papers), colors=brewer.pal(8, "Dark2"),vfont=c("script","bold"),max.words=30,rot.per=0,scale=c(1.5,.6))

wordcloud(tolower(forwordcloud_nomed$keyword),as.numeric(forwordcloud_nomed$count_papers), colors=brewer.pal(8, "Dark2"),family = "mono",font = 2,max.words=30,scale=c(1.3,.6))

Combining three into a plot by years

Here Im just joining it on the to dmerged file that we used earlier. It already has the parasite + broadscale search we just need to add non-medical parasites.

df_nonmed<-citations_nomed_ana.sum$AnnualProduction 
colnames(df_nonmed) <- c("Year", "ArticlesParasite_nonmed")
df_nonmed<-df_nonmed %>%  arrange(.,Year) %>%  mutate(PercentPerYearParasites_nonmed=cumsum(ArticlesParasite_nonmed)/sum(ArticlesParasite_nonmed)*100)
df_nonmed$Year <- as.character(df_nonmed$Year)

head(dmerged)
## # A tibble: 6 x 5
##    Year ArticlesGeneral PercentPerYearGeneral ArticlesParasite PercentPerYearParasites
##   <int>           <int>                 <dbl>            <dbl>                   <dbl>
## 1  1960               1               0.00259                0                  0     
## 2  1962               9               0.0259                 0                  0     
## 3  1963              16               0.0674                 0                  0     
## 4  1964              30               0.145                  0                  0     
## 5  1965              31               0.225                  0                  0     
## 6  1966              45               0.342                  1                  0.0664
dmerged$Year <- as.character(dmerged$Year)
dmerged3<-full_join(dmerged,df_nonmed,by='Year',all=TRUE) 
dmerged3[is.na(dmerged3)] <- 0 
dmerged3$Year<-as.integer(dmerged3$Year)

#lets drop 2019 because its a bit of a dumb point
dmerged3 %>% select(.,ArticlesGeneral,ArticlesParasite,ArticlesParasite_nonmed,Year) %>% filter(.,Year!=2019) %>% tidyr::gather("id", "value", 1:3) %>% ggplot(aes(Year, value)) + 
    geom_point(aes(colour = factor(id)),size = 1) +
    geom_line(aes(colour = factor(id))) +
    theme_bw() +
    theme(axis.text.x = element_text(angle = 90, hjust = 1)) +
    labs(title="Allozymes",x='Year', y=expression(sqrt(italic('Article Number'))), fill="Subset",color = "Article Type\n") +
  scale_y_continuous(trans='sqrt')

#for splines
dmerged3<-dmerged3 %>% filter(.,Year!=2019)
spline_int <- as.data.frame(spline(dmerged3$Year, dmerged3$ArticlesParasite))
spline_int2 <- as.data.frame(spline(dmerged3$Year, dmerged3$ArticlesGeneral))
spline_int3 <- as.data.frame(spline(dmerged3$Year, dmerged3$ArticlesParasite_nonmed))
spline_int$y[spline_int$y < 0] <- 0
spline_int3$y[spline_int3$y < 0] <- 0

ggplot(dmerged3) + 
  geom_point(aes(dmerged3$Year,dmerged3$ArticlesGeneral), col='red',size = 1) +
  geom_point(aes(dmerged3$Year,dmerged3$ArticlesParasite), col='blue',size = 1) +
  geom_point(aes(dmerged3$Year,dmerged3$ArticlesParasite_nonmed), col='green',size = 1) +
  geom_line(data = spline_int2, aes(x,y)) +
  geom_area(data = spline_int2, aes(x,y,fill='blue')) +
  geom_line(data = spline_int, aes(x,y)) +
  geom_area(data = spline_int, aes(x,y,fill='red')) +
  geom_line(data = spline_int3, aes(x,y)) +
  geom_area(data = spline_int3, aes(x,y,fill='green')) +
  theme_bw() +
  theme(axis.text.x = element_text(angle = 90, hjust = 1)) +
  labs(title="Allozymes",x='Year', y=expression(sqrt(italic('Article Number'))), fill="Subset") +
  scale_fill_manual(labels = c("Everyone", "Parasites not medical","Parasites"), values = alpha(c("red", "green","blue"),.6)) +
  scale_y_continuous(trans='sqrt')

Just some ideas for playing around

You should have a play around with the data and see what you can see. I have just given you some broad ideas-explore the data in your own way…. The files from the 500 at a time download have a lot of other metadata that you could explore

#depending on how the data comes down we can look at other things as well
head(citations_df$TC)
## [1] 0 0 0 1 0 0
dmerged.violin<-data.frame(citationcount=citations_df$TC,type='Parasite Search')
dmerged.violin<-rbind(dmerged.violin,data.frame(citationcount=citations_nonmed_df$TC,type='Parasite no medical'))

ggplot(dmerged.violin,aes(type,citationcount))  +
  geom_violin(aes(fill = factor(type))) +
  scale_y_continuous(trans='sqrt')+
  labs(title="Citation count per article",x='Search Group', y=expression(sqrt(italic('Citation Count'))), fill="Subset") +
  theme_bw()+
  scale_fill_manual(values = alpha(c("red", "blue"),.6)) 

#could look at citation over the years
para_citationperyear<-citations_df %>% select(.,PY,TC) %>% group_by(PY) %>% tally(TC)
nomed_citationperyear<-citations_nonmed_df %>% select(.,PY,TC) %>% group_by(PY) %>% tally(TC)
colnames(para_citationperyear) <- c("Year", "CitationParasites")
colnames(nomed_citationperyear) <- c("Year", "CitationParasitesNoMedical")
dmerged.citationPY<-full_join(para_citationperyear,nomed_citationperyear,by='Year')
dmerged.citationPY[is.na(dmerged.citationPY)] <- 0 
head(dmerged.citationPY)
## # A tibble: 6 x 3
##    Year CitationParasites CitationParasitesNoMedical
##   <dbl>             <dbl>                      <dbl>
## 1  1966                 4                          4
## 2  1968                 0                          0
## 3  1973                 0                          0
## 4  1974                29                          0
## 5  1975                13                         13
## 6  1977               258                          0
dmerged.citationPY %>% filter(.,Year!=2019) %>% tidyr::gather("id", "value", 2:3) %>%  ggplot(aes(Year, value)) + 
    geom_point(aes(colour = factor(id)),size = 1) +
    geom_line(aes(colour = factor(id))) +
    theme_bw() +
    theme(axis.text.x = element_text(angle = 90, hjust = 1)) +
    labs(title="Allozymes",x='Year', y=expression(sqrt(italic('Citation Count'))), fill="Subset",color = "Article Type\n") +
  scale_y_continuous(trans='sqrt')